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Preface 



This book deals with some basic themes in mathematical analysis along the 
lines of classical norms on functions and sequences, general normed vector 
spaces, inner product spaces, linear operators, some maximal and square- 
function operators, interpolation of operators, and quasisymmetric mappings 
between metric spaces. Aspects of the broad area of harmonic analysis are 
entailed in particular, involving famous work of M. Riesz, Hardy, Littlewood, 
Paley, Calderon, and Zygmund. 

However, instead of working with arbitrary continuous or integrable func- 
tions, we shall often be ready to use only step functions on an interval, 
i.e., functions which are piecewise-constant. Similarly, instead of infinite- 
dimensional Hilbert or Banach spaces, we shall frequently restrict our atten- 
tion to finite-dimensional inner product or normed vector spaces. We shall, 
however, be interested in quantitative matters. 

We do not attempt to be exhaustive in any way, and there are many re- 
lated and very interesting subjects that are not addressed. The bibliography 
lists a number of books and articles with further information. 

The formal prerequisites for this book are quite limited. Much of what we 
do is connected to the notion of integration, but for step functions ordinary 
integrals reduce to finite sums. A sufficient background should be provided by 
standard linear algebra of real and complex finite-dimensional vector spaces 
and some knowledge of beginning analysis, as in the first few chapters of 
Rudin's celebrated Principles of Mathematical Analysis [[Rudl|| . This is not 
to say that the present monograph would necessarily be easy to read with 
this background, as the types of issues considered may be unfamiliar. On the 
other hand, it is hoped that this monograph can be helpful to readers with 
a variety of perspectives. 
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Chapter 1 

Notation and conventions 



If a and b are real numbers with a < b, then the following are the intervals 
in the real line R with endpoints a and b: 



[a,b] = {x eK: a < X <b} 

{a,b) = {x G R : a < a; < 6} 

[a,b) = {x G R : a < X < 6} 

(a, 6] = {x G R : a < X < 6} 



All but the first is the empty set when a = b, while [a, b] consists of the 
one point a = b. In general, the first of these intervals is called the closed 
interval with endpoints a and b, and the second is the open interval with 
endpoints a and b. The third and fourth are half-open, half-closed intervals, 
with the third being left-closed and right-open, and the fourth left-open and 
right-closed. 

The length of each of these intervals is defined to be 6 — a. If an interval 
is denoted /, we may write |/| for the length of /. 

For the record, see Chapter 1 in ||Kudl| | concerning detailed properties of 



the real numbers (as well as the complex numbers C). In particular, let us 
recall the "least upper bound" or "completeness" property, to the effect that 
a nonempty set E of real numbers which has an upper bound has a least 
upper bound. The least upper bound is also called the supremum of E, and 
is denoted supE'. Similarly, if F is a nonempty set of real numbers which 
has a lower bound, then E has a greatest lower bound, or infimum, which 
is denoted inf E. We shall sometimes use the extended real numbers (as in 
Rudlf ), with oo and — oo added to the real numbers, and write supi? = oo 
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CHAPTER 1. NOTATION AND CONVENTIONS 



and inf F — — oo if E and F are nonempty sets of real numbers such that E 
does not have an upper bound and F does not have a lower bound. 

If ^4 is a subset of some specified set X (like the real line), we let 1a{x) 
denote the indicator function of A on X (sometimes called the characteristic 
function associated to A, although in other contexts this name can be used 
for something quite different). This is the function which is equal to 1 when 
X & A, and is equal to when x e X\A. 

Definition 1.1 (Step functions) A function on the real line, or on an in- 
terval in the real line, is called a step function if it is a finite linear combi- 
nation of indicator functions of intervals. 

This is equivalent to saying that there is a partition of the domain into 
intervals on which the function is constant. 

In these notes, one is normally welcome to assume that a given function 
on the real line, or on an interval in the real line, is a step function. In fact, 
one is normally welcome to assume that a given function is a dyadic step 
function, as defined in the next chapter. For step functions, it is very easy to 
define the integral over an interval in the domain of definition, by reducing 
to linear combinations of lengths of intervals. 

An exception to this convention occurs when we consider convex or mono- 
tone functions, which we do not necessarily wish to ask to be step functions. 
When dealing with integrals, typically the function being integrated can be 
taken to be a step function. (This function might be the composition of a 
non-step function with a step function, which is still a step function.) 



Chapter 2 
Dyadic intervals 



2.1 The unit interval and dyadic subintervals 

Normally, a reference to "the unit interval" might suggest the interval [0, 1] 
in the real line. It will be convenient to use [0, 1) instead, for minor technical 
reasons (and one could easily work around this anyway). 

Definition 2.1 (Dyadic intervals in [0,1)) T/ie dyadic subintervals of the 
unit interval [0, 1) are the intervals of the form [j 2~^, (j + 1) 2^'^), where j 
and k are nonnegative integers, and j + 1 < 2^^. (Thus the length of such an 
interval is of the form 2"^ , where k is a nonnegative integer.) 

In general one can define the dyadic intervals in R to be the intervals of 
the same form, except that j and k are allowed to be arbitrary integers. 

The half-open, half-closed condition leads to nice properties in terms of 
disjointness, as in the following lemmas. (With closed intervals one could get 
disjointness of interiors in similar circumstances. This would be fine in terms 
of integrals, measures, etc.) 

Lemma 2.2 (Pcirtitions of [0, 1)) For each nonnegative integer k, [0, 1) is 
the union of the dyadic subintervals of itself of length 2~^ , and these intervals 
are pairwise disjoint. 

Lemma 2.3 (Comparing pairs of intervals) If Ji and J2 are two dyadic 
subintervals of [0, 1), then either Ji C J2, or J2 C J^, or Ji fl J2 = 0- (The 
first two possibilities are not mutually exclusive, as one could have Ji = J2.) 
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These two lemmas are not hard to verify, just from the definitions. (Ex- 
ercise.) For the second lemma, one might be a bit more precise and say the 
following. Suppose that Ji and J2 are dyadic subintervals of [0, 1), and that 
Ji has length 2"^!, and J2 has length 2'''^ If h < ^2 (so that 2'^^ > 2-''^), 
then either J2 C J^^ or Ji fl J2 = 0- 

Lemma 2.4 (Partitions of dyadic intervals) If J is a dyadic subinterval 
of [0, 1) of length 2^^ , and if m is an integer greater than k, then J is the 
union of the dyadic subintervals of itself of length 2~"^ (and these intervals 
are pairwise disjoint). Also, every dyadic subinterval of [0, 1) of length 2""^ 
is contained in a dyadic subinterval of [0, 1) of length 2~^ when m > k. 

(Exercise.) 

Lemma 2.5 (Structure of unions of dyadic intervals) Let be an ar- 
bitrary nonempty collection of dyadic subintervals of [0, 1). Then there is a 
subcollection of T such that 

(2.6) U'^=U'^ 

and the elements of are pairwise disjoint, i.e., if Ji, J2 are distinct ele- 
ments of Tq, then Ji fl J2 = 0. 

To prove this, we take JFq to be the set of maximal elements of JF. Here 
"maximal" means maximal with respect to the ordering that comes from 
set-theoretic inclusion, so that J G is maximal if there is no element K of 
such that J O K and J ^ K. 

If J is any dyadic subinterval of [0, 1), then there are only finitely many 
dyadic subintervals of [0, 1) which contain J as a subset. Indeed, if J has 
length 2~^ , then there is exactly one such dyadic interval of length 2"^^ for 
each nonnegative integer k, k < j. 

Using this, it is easy to see that every interval in is contained in a 
maximal interval in JF. In particular, the set jFg of maximal elements of .F 
is nonempty, since JF is nonempty, by assumption. We also obtain ( |2.6|) . 

Any two maximal elements of JF which are distinct are disjoint. This 
follows from Lemma |2.3| . This proves the second property of JFg in Lemma 
I5|. 



2.2. FUNCTIONS ON THE UNIT INTERVAL 
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2.2 Functions on the unit interval 

Suppose that / is a function on the unit interval [0, 1) (real or complex- 
valued). We assume that / is at least mildly- well behaved, so that it makes 
sense to talk about integrals of / over subintervals of [0, 1) (and with the 
customary basic properties). 

Assume that a nonnegative integer k is also given. Let us define a function 
Ek{f) on [0, 1) as follows. If x is any element of [0, 1), then there is a unique 
dyadic subinterval J of [0, 1) of length which contains x, as in Lemma 
Then we set 



when a, b are constants and /i, /2 are functions on [0, 1) (which are at least 
mildly- well behaved, as before). 

Lemma 2.9 (Some properties of Ek{f)) (a) For each f, Ek{f) is con- 
stant on the dyadic subintervals of [0, 1) of length 2~^. 

(b) If f is constant on the dyadic subintervals o/[0, 1) of length , then 
Ek{f ) = f ■ (In particular, note that -Efc(l) = 1 for all k.) 

(c) For all f , if j is an integer such that j > k, then Ej{Ek{f)) = Ek{f). 
Also, Ek{Ej{f)) = Ek{f) in this case. 

(d) If g is a function on [0, 1) which is constant on the dyadic subintervals 
of [0, 1) of length 2-^ then Ek{g f)=g E^if) for all f. 

(Exercise. Note that the first part of (c) holds simply because Ek{f) is 
constant on dyadic subintervals of [0, 1) of length when j > k, while in 
the second part one is first averaging / on the (smaller) dyadic intervals of 
length 2~^ to get Ej{f), and then averaging the result on the larger dyadic 
intervals of size to get Ek{Ej{f)), and the conclusion is that this is the 
same as averaging over the dyadic intervals of length 2~*^ directly. As in 
Lemma |2.4] , dyadic subintervals of [0, 1) of size 2~^ are disjoint unions of 
dyadic intervals of size 2~^ when j > k.) 

Definition 2.10 (Dyadic step functions) A function f on [0, 1) is called 
a dyadic step function if it is a finite linear combination of indicator functions 
of dyadic subintervals of [0, 1). 



(2.7) 




Note that Ek{f) is linear in /, so that 



(2.8) 



Ek{af, + bf2)=aEk{fi)+bEk{f2) 
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Lemma 2.11 Let f be a function on [0, 1). The following are equivalent: 

(a) f is a dyadic step function; 

(b) There is a nonnegative integer k such that f is constant on every 
dyadic subinterval of [0, 1) of length ; 

(c) Ek{f) = f for some nonnegative integer k (and hence for all suffi- 
ciently large integers k). 



(Exercise, using Lemma 



2.3 Haar functions 

Let / be a dyadic subinterval of [0, 1). Thus / is the disjoint union of two 
dyadic subintervals Ii and of half the size of I, corresponding to the left 
and right halves of /. Define the Haar function hj{x) on [0, 1) associated to 
the interval / by 

(2.12) hi{x) = -|/|^/^ when X ell 

= lll^^"^ when X G /r 



= when x G [0, 

(2.13) / hi(x)dx = 

J\o.i) 



Notice that 



and 

(2.14) / hj(x)'^dx = l. 

J[o,i) 

In addition to these functions, we define a special Haar function ho{x) on 
[0, 1) by simply taking hQ{x) = 1 for all a; G [0, 1). For this function we also 
have that 

(2.15) / ho(x?dx = l. 

7[0,1) 

If / and J are distinct dyadic subintervals of [0, 1), then hj and hj satisfy 
the orthogonality condition 



(2.16) / hj{x)hj{x)dx = 0. 

J[0,1) 



To see this, it is helpful to consider some cases separately. If / and J are 
disjoint, then the product hj{x) hj[x) is equal to for all x G [0, 1), and the 
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integral vanishes automatically. Otherwise one of the intervals / and J is 
contained in the other. We may as well assume that J ^ I, since the two 
cases are completely symmetric. Since J 7^ /, J is either a subinterval of Ii 
or of Ir- In both situations we have that hj is constant on J, where hj is 
concentrated. As a result, (|2.16|) follows from (|2.13|) . 

In addition, if / is any dyadic subinterval of [0, 1), we have the orthogo- 
nality relation 

(2.17) / ho{x) hi{x) dx = 0, 

J[0,1) 

again by ( |2.13| ). 

Given a nonnegative integer k, consider the space of dyadic step functions 
on [0, 1) which are constant on dyadic intervals of length 2~''. This space 
contains the Haar functions ho and hj for all dyadic subintervals / of [0, 1) 
such that |/| > 2~^~^^. In fact, every dyadic step function on [0, 1) which is 
constant on dyadic intervals of length is a linear combination of these 
Haar functions. This can be verified using induction on k. Alternatively, 
one can note that the space of these functions has dimension 2^, and the 
total number of Haar functions being used is also 2^. More precisely, for 
each nonnegative integer j < k there are 2^ Haar functions hj associated to 
dyadic subintervals of [0, 1) of length 2~^ , and if we sum over j then the total 
number of these is J2'^Zq 2-' = 2^ — 1. If we add the Haar function ho, then 
we obtain a total of 2 Haar functions in this space. 

In particular, the space of all dyadic step functions on [0, 1) is equal to the 
space of functions which can be given as finite linear combinations of Haar 
functions (associated to arbitrary dyadic subintervals of the unit interval, 
and also the Haar function ho). 

If / is a dyadic step function on [0, 1), then 

(2.18) f={f,ho)ho + J2{f,hj)hj, 

I 

where (/, ho) = /[q /(x) ho{x) dx, and analogously for hj instead of ho, and 
where the sum is taken over all dyadic subintervals / of [0, 1). The sum is in 
fact a finite sum, in the sense that all but finitely many terms are equal to 0. 
This expression for / follows from the fact that / is equal to a finite linear 
combination of the Haar functions, and from the orthonormality conditions 
for the Haar functions given above. 

Similarly, if A; is a nonnegative integer, and Ek{f ) is as defined in Section 
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then 

(2.19) Ek{f) = {f,ho)ho+ J2 {f^hj)hr. 

Now the sum is taken over all dyadic subintervals / of [0, 1) whose length is 
at least 2"'^+^, and this is interpreted as being when k = 0. 

2.4 Binary sequences 

Let B denote the set of all binary sequences, i.e., sequences {xj}'^^ such that 
Xj = or 1 for all j. There is a natural correspondence between binary 
sequences and real numbers in the interval [0,1], through standard binary 
expansions. More precisely, every binary sequence defines an element of 
[0,1], every element of [0,1] has such a binary expansion, and the binary 
expansion is unique except for rational numbers of the form j2^'^, where j 
and k are positive integers. If we restrict ourselves to the interval [0, 1), then 
the binary sequence with all I's does not correspond to a point in [0, 1), but 
this does not cause serious trouble. In particular, these various exceptions 
do not cause trouble for the integrals that we consider. 

If one accounts for these exceptions in a suitable (and simple) way, then 
dyadic intervals in [0, 1) correspond in a nice manner to subsets of B defined 
by prescribing the values of a binary sequence {xj}'^^ for j = 1, . . . ,n for 
some n > 1, and leaving the other x^'s free. Similarly, dyadic step functions 
on [0, 1) correspond to functions on B which depend only on the first n terms 
of a binary sequence for some positive integer n (with suitable allowances at 
exceptional points, as above). 



Chapter 3 

Convexity and some basic 
inequalities 

3.1 Convex functions 

Let / be a subset of the real line which is either an open interval, or an open 
half-line, or the whole real line R itself. We shall always assume that / is of 
this type in this section. 

A real-valued function (j){x) on / is said to be convex if 

(3.1) 0(A x+{l-\)y)<\ 0(x) + (1 - A) 

for all x,y E I and A G [0, 1]. (Note that Xx + {1 — X) y E I when x,y E I, 
so that 0(Ax + (1 — A) y) is defined.) 

If 0(x) is affine, 0(x) = ax + b for some real numbers a and b, then 0(x) 
is convex on the whole real line, and in fact one has equality in ( |3.1| ) for all 
X, y, and A. This equality for all x, y, and A characterizes affine functions, 
as one can easily check. 

For (f){x) = \x\ we have that 

(3.2) + 1/1 < + |y| and |Ax| = |A| |x| 

for all x,y,X E R, and the convexity condition ( |3.1| ) follows easily from this. 

If (pix) is an arbitrary convex function on /, as above, and if a is a real 
number, then the translation 0(x — c) of 4>{x) is a convex function on 

(3.3) I + c= {x + c: X e I}. 

In particular, for each real number c, |x — c| defines a convex function on R. 
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Lemma 3.4 A function 4>{x) on I is convex if and only if the following is 
true: for any points s,t,u & I such that s < t < u, 

- 0(g) ^ - 0(g) ^ 0(^)-0(t) ^ 

t — s ~ u — s ~ u — t 

Roughly speaking, this condition says that the difference quotients of 
increase, or remain the same, as the points used in the difference quotient 
become larger in R. 

If s, t, u are as in the lemma, then 

t — s u — t 
3.6 t= u+ s. 

u — s u — s 

This is easy to check, and we have that 

(3.7) ^e(0,l). = 

u — s u — s u — s 

Thus, if 0(x) is convex, then 

/ — II — / 

(3.8) (Pit) < 0(m) + 0(s). 

u — s u — s 



One can rewrite this in two different ways to get ( |3.5| ). To get the converse, 
one can work backwards. That is, either one of the inequalities in (|3.5|) can 
be rewritten to give (p.8|) , and this gives (|3.1|) when s, t, and u are obtained 
in the natural way from x, y, and A, following ( p.6|) . 



Lemma 3.9 A function 0(x) on I is convex if and only if the following 
condition holds: for each t E I there is a real-valued affine function A{x) on 
R such that A{t) = (f){t) and A{x) < 0(x) for all x G /. 

To see that this condition is sufficient for to be convex, one can apply 
the hypothesis with t = Xx-\-{l — X)y (given x, y, and A in the usual manner) 
to get an affine function A{x) as above, and then observe that 

(3.10) 0(Ax + (1 - A)?/) = A{\x + {1- \)y) 

= XA{x) + {l-X)A{y) 
< A0(a;) + (1-A)0(|/). 



3.1. CONVEX FUNCTIONS 
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Conversely, suppose that (f){x) is convex, and let t E I be given. We want 
to choose a real number a so that A{x) = +a (x — t) satisfies A{x) < 
for all X G /. In other words, the latter condition asks that 

(3.11) a {x - t) < (pix) - (p{t) 

for all X E I. This is automatic when x = t, and we can rewrite the inequality 

as 

(f)(x) - (hit) 

(3.12) a < — ^ 
when X > t, and as 

(3.13) < , 

when X < t. 



From Lemma we have that 

.3 (I>{t) - 0(g) < (l>{u) - m 

^ ■ ^ t-s - u-t 

for all s,u E I such that s <t < u. This implies that 

(3.15) A = sup{^^^l^:.G/,.<t} 
and 

(3.16) D, = i4^-^^^^^:uel,u>t\ 

V u — t J 

are well-defined (i.e., that the set of numbers of which the supremum is taken 
is bounded from above, and that the set of numbers of which the infimum is 
taken is bounded from below), and that 

(3.17) A < A- 

To get (|3.12|) and ( p.l3|) , we want to choose a G R so that 

(3.18) Di<a< A, 

and this we can do. This completes the proof of Lemma p.9|. 
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3.2 Jensen's inequality 

Let I be as in the previous section, and suppose that (pix) is a convex function 
on /. Also let J be an interval in R (with nonzero length) and let f{x) be 
a function on J such that f{x) G / for all x E J . Then | J|~^ /j f{x) dx G /, 
and 

(3.19) / f{x)dx) < \J\-^ I 4>{f{x))dx. 

This is known as Jensen's inequality. 

Let us first consider a version of this for sums. Suppose that Xi,X2^ ■ ■ ■ ,Xn 
are elements of /, and that Ai, A2, . . . , A„ are nonnegative real numbers such 
that 

n 

(3.20) E^i = l- 

i=l 

Then Yll=i \ ^ and 

n n 

(3.21) <T.^^^i^^)■ 



This is equivalent to (|3.19|) for step functions f{x). (For more general func- 
tions, one should be a bit more careful.) 

The definition of convexity ( p.l|) is the same as ( p.21|) when n = 2, and one 



can use it repeatedly to get the general case (via induction). Alternatively, 
one can use the characterization of convexity in Lemma p.9| , and make an 
argument very similar to the one used in ( 3.1(J| ) in order to get ( |3.21 ). 
Let us note that 

(3.22) (j)(t) = is a convex function on [0, 00) 
when p is a real number such that p > 1. 

This is well known. Although our current convention is to use open intervals 
and half-lines for the domains of convex functions, it is easy to allow for 
the endpoint here, with properties like those that have been discussed. In 
particular, we have that 

(3.23) (|J|-i / fix)dxy < \J\-' [ fixYdx 



for nonnegative functions j[x) on an interval J. 



3.3. HOLDER 'S INEQ UALITY 
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3.3 Holder's inequality 

Let p, q be real numbers such that p,q > I and 

1 1 

3.24 - + - = 1. 

P Q 

In this case we say that p and q are conjugate exponents. 

Fix an interval J in R. If f{x) and g{x) are nonnegative functions on J, 
then Holder's inequality states that 

(3.25) j^f{x)g{x)dx<{j^f{yYdyf'\j^g{zYdzf\ 

(As usual, one should make some modest assumptions so that the integrals 
make sense, and by our conventions one is free to restrict one's attention to 
step functions.) 

We can allow one of p and g to be 1 and the other to be oo (which is 
consistent with (|3.24|) ). If p = 1 and q = oo, then the analogue of (|3.25|) is 
that 

(3.26) / f{x)g{x)dx<(f f{y)dy)Un^g{z)] 

(which is very simple). 

Let us prove Holder's inequality in the case where p,q > 1. We begin 
with some preliminary observations. The inequality is trivial if f or g is 
identically 0, since the left side of (|3.25|) is then 0. Thus we assume that 
neither is identically 0, so that 

(3.27) {jjivYdyf" 
and 

(3.28) (/ gizYdz)"' 



are both nonzero. 

Next, we may assume that ( p.27| ) and ( |3.28|) are both equal to 1. In other 
words, if we can prove ( p.25[ ) in this special case, then it follows in general, 
by multiplying / and g by constants. 

Our basic starting point is that 



14 



CHAPTER 3. CONVEXITY AND SOME BASIC INEQUALITIES 



for all nonnegative real numbers s and t. This is a version of the geometric- 
arithmetic mean inequalities, and it is the same as the convexity of the ex- 
ponential function. 

By integrating we are lead to 



(3.30) ^ fix) g{x) dx<^J^ f{xf dx + ^J^ g{xy dx. 



This gives (3.25) when ( ^.271 ) and (3.28) are equal to 1, which is what we 
wanted. 

Let us note the version of Holder's inequality for sums. Namely, 

(3.31) E«^&^<(E<)'^'(E&''''^' 



k 



holds for arbitrary nonnegative real numbers Oj, hi when p and q conjugate 
exponents. If p = 1 and q = oo, then this should be interpreted as 

(3.32) ttihi < aj) (sup bk 



which is very simple, as before. One can prove ( |3.31| ) in the same manner as 
before, or derive it from the previous version. 

The p = q = 2 case of (|3.31|) is the Cauchy-Schwarz inequality. The 
p = q = 2 case of ( |3.25| ) is also called a Cauchy-Schwarz inequality. 

Observe that ( |3.23D can be derived from Holder's inequality ( |3.25| ) by 
taking g = 1. 



3.4 Minkowski's inequality 

Fix an interval J in R. Let / and g be nonnegative functions on J, and let 
p be a real number, p > 1. Then 

(3.33) {ju{x)+g{x)fdxf'' 

< U f^xYdxy'" +U g{xYdx)^'\ 

This is Minkowski's inequality. 
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The analogue of this for j9 = oo is 

(3.34) sup(/(x) + g{x)) < sup/(x) + sup g{x). 

This inequahty is easy to verify. 

The p = 1 case of ( |3.33|) is clear (and with equality), and so we focus 
now on 1 < p < oo. We shall describe two arguments, the first employing 
Holder's inequality. 

One begins by writing 

(3.35) (fix) + g{x)r = fix) {f{x) + g{x)Y-' + g{x) {f{x) + g{x)Y . 
Thus 

(3.36) l{f{x)+g{x)fdx 
, f{x) (fix) + gix)Y-' dx+ [ g{x) {f{x) + g{x)r-' dx. 



J 



If g > 1 is the conjugate exponent of = 1, then Holder's inequality 

implies that 

(3.37) lf{x){f{x) + g{x)r-^dx 



J 



Because of the choice of g, we have that q{p — 1) = p, and hence 



(3.38) jj{x){f{x)+g{x)Y-Ux 

< {jj{yrdyf\jp{z)+g{z)Ydz)'~"r 

Similarly, 

(3.39) ig{x){f{x) + g{x)Y-Ux 



< [\^9{yfdy) 'W{j{z)^g{z)rdz) 

and therefore 

(3.40) f{f{x)+gix)ydx 

f{yY dy) + ( / g{yf dy) \ ( / (/(z) + g{z)r dz 



J 



< 
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It is easy to derive ( |3.33| ) from this. 

Before discussing the second argument, let us mention that the version 
of Minkowski's inequahty for sums is 



(3.41) 



1/p 



i/p 



+ (E^) 



i/p 



where a,, 6j > for all i and p > I. This can be proved in the same manner 
as above. (One can also derive the version for sums from the version for 
integrals, as well as the other way around.) The analogue for p = cxd is 



(3.42) 



sup(aj + bi) < sup Qi + sup 6j. 



Let us go through the second argument in the case of sums. Fix p, 
1 < p < oo, and assume for the moment that 



(3.43) 



i/p 



i/p 



1. 



Suppose that t is a real number such that < t < 1. We would like to show 
that 

(3.44) (J2ita, + il-t)br^'^' 



< 1. 



This would follow from Minkowski's inequality, and it is not hard to verify 
that one can derive Minkowski's inequality from this version. 
We can rewrite ( 3.43|) and ( p.44|) as 



(3.45) 



E< = E^? = i 



and 

(3.46) ^(ta, + (l-t)6,)P<l. 

i 

To go from ( p.45| ) to ( p.46| ) it suffices to know that 

(3.47) itai + il-t)ky <taP + {l-t)lfi 

for each i. This inequality follows from the convexity of the function t^, as 
in (|3.22|) in Section |3.2|. 
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3.5 p<l 



If < p < 1, then ( p.33| ) and ( p.41 ) no longer hold in general. One has the 
alternatives 

(3.48) J^{f{x)+g{x)ydx < Jj-{xfdx + J^g{xfdx 
and 

(3.49) E(«^ + &0"<E< + E&r 

i i i 

These alternatives do not hold in general when p > 1 (and become equations 
when p = 1). 

The underlying building block for these inequalities is the fact that 

(3.50) {c + dy<d' + dP 

when c and d are nonnegative real numbers, and p is a real number such that 
< p < 1. Once one has (|330D , the previous inequalities follow simply by 
integrating or summing. 



Chapter 4 

Normed vector spaces 



In this book, all vector spaces use either the real or complex numbers as their 
underlying scalar field. We may sometimes wish to restrict ourselves to one 

or the other, but in many cases either is fine. Throughout this chapter, we 
make the standing assumption that all vector spaces are finite-dimensional. 
Given a real or complex number t, we let \t\ denote its absolute value or 
modulus. 



4.1 Definitions and basic properties 

Let y be a vector space (real or complex) . By a norm we mean a nonnegative 
real- valued function || • || on y which satisfies the following properties: \\v\\ — 
if and only if v is the zero vector in V; \\tv\\ = \t\\\v\\ for all vectors v ^ V 
and all scalars t; and ||-?7 + -u;|| < ||f || + 11^11 for all v,w & V. The last property 
is called the triangle inequality for || ■ ||. 

A vector space equipped with a choice of norm is called a normed vector 
space. 

If Si = {v G V : 1 1 I'll < 1} is the (closed) unit ball corresponding to 
the norm || ■ ||, then Bi is a convex subset of V. In other words, if v, w are 
elements of Bi and t is a real number, < t < 1, then tv + {1 — t)w E Bi. 
This is easy to derive from the homogeneity property and triangle inequality 
for II • II . Conversely, if one assumes the homogeneity property for || • || and 
that the unit ball Bi is convex, then it is easy to show that the triangle 
inequality holds for || ■ ||. 

As a basic family of examples, take V to be R" or C", and, for a given 
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1 < p < cxD, set 
(4.1) 

when j9 < oo, and 
(4.2) 



i/p 



max \Vi 



Here Vj, 1 < j < n, denote the components of v in R" or C". That these are 



indeed norms is easy to check, using Section |3.4| for the triangle inequahty. 
Let be a vector space, and let || ■ || be a norm on V. Notice that 



(4.3) - \\w\ 



for all v,w & V. This follows from 



< Ik' 



(4.4) 



If II < \\w\\ + ||f 



w\ 



w\ 



and the analogous inequality with v and w interchanged, which are instances 
of the triangle inequality. 

Suppose that V = R" or C", which is not a real restriction, since every 
vector space is isomorphic to one of these. Let |a;| denote the standard 
Euclidean norm on R" or C" (which is the same as the norm ||a:||2 in (^4.1|) ). 
Then there is a positive constant C, depending on n and the given norm || ■ || 
such that 

(4.5) \\v\\ < C\v\ 

for all f G V". This is not hard to check. 

From ( [4.3| ) and ( [4.5| ) we obtain in particular that ||f|| is a continuous 
real-valued function on V, with respect to the usual Euclidean metric and 
topology. As a consequence, there is a positive real number C such that 



(4.6) 



\v\ < C'\ 



for all V G V. More precisely, because of homogeneity, it suffices to check 
this for V in the standard unit sphere, |f | = 1. We want to show that ||f || 
is bounded from below by a positive constant on this set. The definition of 
a norm ensures that ||i;|| > for all f 7^ 0, and the compactness of the unit 
sphere {v & V : \v\ = 1} and the continuity of \\v\\ then imply that ||f || is 
bounded from below by a positive real number, as desired. 
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4.2 Dual spaces and norms 

Let y be a vector space, real or complex. By a linear functional on V we 
mean a linear mapping from V into the field of scalars (the real or complex 
numbers, as appropriate). The dual of V is the vector space of linear func- 
tional on V, a vector space over the same field of scalars. The dual space of 
V is denoted V*, and it has the same dimension as V. 

Now suppose that || ■ || is a norm on V. The dual norm || • ||* on V* is 
defined as follows. If A is an element of V*, and thus a linear functional on 
V, then 

(4.7) ||A||* = sup{|A(t;)| -.veV, \\v\\ = 1}. 

The remarks near the end of the Section |4.1| show that this supremum is 
finite. This definition of ||A||* is equivalent to saying that 

(4.8) |A(t;)| < ||A||* 

for all V E V, and that ||A||* is the smallest nonnegative real number with 
this property. It is not hard to verify that || ■ ||* does indeed define a norm 
on V*. 

Let us look at this in the context of the examples mentioned in Section 



4^1]. That is, suppose that V = R" or C", and take || • ||p for the norm on V, 
for some p which satisfies 1 < p < oo. 

We can identify V* with or (respectively) by associating to each 
w in R" or C" the linear functional A^ on V given by 

n 

(4.9) X^{v) = Y,WjVj. 

i=i 

Let g, 1 < g < cxD be the exponent conjugate to p, so that 1/p + 1/q = 1. If 
II • II = II ■ lip, let us check that ||A^||* = \\w\\q for all w. 
First, we have that 

(4.10) \Xw{v)\ <\\w\\g\\v\\p 

for all w and v by Holder's inequality. To show that ||A^||* = \\w\\g, we would 
like to check that for each w there is a nonzero v so that 

(4.11) \Kiv)\ = \\w\\g \\v\\p. 



Let w be given. We may as well assume that w ^ 0, since otherwise any v 
would do. Let us also assume for the moment that p > 1, so that q < oo. 
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Under these conditions, define v by 

(4.12) Vj =w-\wj\'^-^ 



when Wj ^ 0, and by vj = when wj = 0. Here w] denotes the complex 
conjugate of wj, which is not needed when we are working with real numbers 
instead of complex numbers. With this choice of f , we have that Xw{v) = 
= \\w\\'^. It remains to check that 



(4.13) ||t;||p=||lf; 



19-1 



in this case. If g = 1, then p = oo, and this equation is the same as saying 
that max \ vj\ = 1. Indeed, if g = 1, then \vj\ = 1 whenever vj 7^ 0, and this 
happens for at least one j because w 0. Thus max \vj\ = 1. If g > 1, then 
\vj\ = \wj\''~^ for all j, and one can verify (|4.13|) using the fact that p and q 



are conjugate exponents. 

Finally, if p = 1, so that q = 00, then choose (a single) k, 1 < k < n, so 
that \wk\ = max|wj| = ||w||oo- Define v hj Vk = Wk\wk\~^ and Vj = when 
j 7^ k. Then A^(w) = \wk\ = \\w\\oo and \\v\\i = 1, and (|4.11| ) holds in this 



situation as well. This completes the proof that ||Ai„||* = \\w\ 



4.3 Second duals 

Let be a vector space, and V* its dual space. Since V* is a vector space 
in its own right, we can take the dual of it to get the second dual V** of V. 

There is a canonical isomorphism from V onto V**, which is defined as 
follows. Let V E V he given. For each X E V*, we get a scalar by taking 
A(f). The mapping A ^— X{v) defines a linear functional on V*, and hence 
an element of V**. Since we can do this for every v E V, we get a mapping 
from V into V** which one can check is linear and an isomorphism. 

Now suppose that we also have a norm || ■ || on l^. This leads to a dual 
norm || ■ ||* on V*, as in the previous section. By the same token, we get a 
double dual norm || ■ ||** on V**. Using the canonical isomorphism between 
V and V** just described, we can think of || ■ ||** as defining a norm on V. 
Let us show that 

(4.14) = hiaWveV. 

Note that this holds for the p-norms || ■ \\p on R" and C" by the analysis of 
their duals in the preceding section. 
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Let V ^ V he given. To prove ( |4.14| ) for this choice of it suffices to 



estabhsh the following two statements. First, 

(4.15) |A(i;)| < ||A|n|t;|| for all A e r*. 
Second, there is a nonzero Aq G V* such that 

(4.16) Ao(t;) = ||Ao|n|t;||. 

The first statement comes from the definition of ||A||*, and so we only need 
to prove the second one. For this we may as well suppose that v ^ 0. We 
shall apply the next extension result. (Alternatively, one could approach this 
as in Section 



Theorem 4.17 Let V be a vector space (real or complex), and let \\ • || be a 
norm on V . Suppose that W is a vector subspace ofV and that fi is a linear 
functional on W such that 

(4.18) Hw)\<\\w\\ forallweW. 

Then there is a linear functional p,onV which is an extension of n and has 
norm less than or equal to 1. 



The existence of a nonzero Aq G V* satisfying ( |4.16| ) follows easily from 



this, by first defining Aq on the span of v so that Ao(f) = ||f ||, and then 
extending to a linear functional on V with norm 1. 

To prove the theorem, we first assume that ^ is a real vector space. 
Afterwards we shall discuss the complex case. 

Let W and /z be given as in the theorem. For each nonnegative integer j 
less than or equal to the codimension of W in V, we would like to show that 
there is a vector subspace Wj of V and a hnear functional fij on Wj such 
that W C Wj, the codimension of Wj in V is equal to the codimension of W 
in V minus j , fij = fi on W , and 

(4.19) \f^j{w)\ < \\w\\ for all w G Wj. 

If we can do this with j equal to the codimension of W in V, then Wj would 
be equal to V, and this would give a linear functional on V with the required 
properties. 
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Let us show that we can do this for each j less than or equal to the 
codimension of in by induction. For the base case of j = we simply 
take Wq = W and /io = /U. 

Suppose that j is a nonnegative integer strictly less than the codimension 
of Pi^ in such that Wj and fij exist as above. We would like to choose Wj+i 
and /ij+i with the analogous properties for j + 1 instead of j. 

Under these conditions, the codimension of Wj in V is positive, so that 
there is a vector z in V which does not lie in Wj. Fix any such z, and take 
Wj+i to be the span of Wj and z. Thus the codimension of Wj+i in V is the 
codimension of Wj in V minus 1, and hence is equal to the codimension of 

in minus j + 1, as desired. 

We shall choose /Xj+i to be an extension of fij from Wj to Wj+i. Let a 
be a real number, to be chosen later in the argument. If we set = a, 

then Hj+i is determined on all of VF^+i by linearity and the condition that 
fij+i be an extension of fij. Note that W C Wj+i and /ij+i = /i on W, 
because of the corresponding statements for Wj and /ij. 

The remaining point is to show that fij+i satisfies the counterpart of 
(|419| ) for J + 1, i.e., that 

(4.20) l/^i+iHI < \\w\\ for all w e Wj+i. 
This is the same as saying that 

(4.21) IfJ'j+iiu + tz)\ < \\u + tz\\ for all u e Wj and t eK, 

since Wj+i is the span of Wj and z. To establish this, it is enough to show 
that 

(4.22) \l^j+i{u + z)\ < \\u + z\\ for all u G Wj. 

Indeed, the case where t = in the previous statement corresponds exactly 
to our induction hypothesis ( [4.19| ) for fij and Wj. If t 7^ 0, then one can 
eliminate it using linearity of /ij+i, homogeneity of the norm || ■ ||, and the 
fact that Wj is invariant under scalar multiplication (since it is a vector 
subspace). 

Let us rewrite ( [4.22| ) as 

(4.23) |/Xj(m) + a| < ||m + z\\ for all u G Wj. 



In other words, we want to choose a a G R so that iWM) holds. If we can 
do this, then the rest works, and we get /i^+i and Wj+i with the desired 
features. 
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Because we are in the real case, fij{u) is a real number for all u G Wj, 
and (^4.23|) is equivalent to 

(4.24) - fij{u) - \\u + z\\ <a< -fij{u) + ||m + z\\ for all u G Wj. 

To show that there is an a G R which satisfies this property, it is enough to 
establish that 

(4.25) — /Uj('Ui) — IImi + z\\ < —Hj{u2) + \\u2 + z\\ for all Ui, U2 G Wj. 
This condition is the same as 

(4.26) fij{u2 — ui) < \\u2 + z\\ + 11^1 + z\\ for all ui, U2 G Wj. 
By the triangle inequality, it is enough to know that 

(4.27) fij{u2 — ui) < \\u2 — ui\\ for all mi,M2 G Wj. 



This last is a consequence of ( |4.19| ). Thus we can choose a G R with the 
required properties, and the induction argument is now finished. This com- 
pletes the proof of Theorem |4.17] when is a real vector space. 

To handle the case of complex vector spaces, let us make the following 
remarks. Given a complex linear functional on a complex vector space, we 
can take its real part to get a linear functional on the vector space now 
viewed as a real vector space, i.e., where we forget about multiplication by i. 
Conversely, given a real-valued function /i on a complex vector space which 
is linear with respect to real scalars (and vector addition), there is a complex 
linear functional on the vector space whose real part is h. Specifically, one 
can take h{w) — ih{iw) for this complex linear functional. If our complex 
vector space is equipped with a norm, then the norm of a (complex) linear 
functional on the space is equal to the norm of the real part of the functional, 
viewed as a linear functional on the real version of the vector space (forgetting 
about i). This is not hard to verify. The main point is that for each complex 
linear functional A on the space and each vector v in the space, there is a 
complex number /3 such that = 1 and A(/3f) = P X{v) is real. Also, the 
norm of /5 f is equal to the norm of v, by the properties of a norm on a 
complex vector space. 



Using these remarks, it is not hard to derive the version of Theorem [4.17 
for complex vector spaces from the version for real vector spaces. 
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4.4 Linear transformations and norms 

Suppose that Vi and V2 are vector spaces (both real or both complex) equipped 
with norms || ■ ||i and || ■ II2 (where the subscripts are now simply labels for 
arbitrary norms, rather than referring to the p-norms from Section |4.1|) . If 
T : Vi V2 is a linear transformation, then the operator norm \\T\\op of T 
(with respect to the given norms on Vi and V2) is defined by 

(4.28) \\T\\op = sup{||T(w)||2 -.vEVi, \\v\\, = 1}. 
This is equivalent to saying that 

(4.29) \\Tiv)\\2 < \\T\lp \\v\U for all v E 

and that ||T||op is the smallest nonnegative real number with this property. 

The finiteness of ||T||op is easy to establish using the remarks near the end 
of Section [4.1| . Notice that the dual norm on the space of linear functionals 
on a vector space is a special case of the operator norm, in which one takes 
V2 to be the 1-dimensional vector space of scalars, with the standard norm. 

The set of all linear transformations from Vi to V2 is a vector space in 
a natural way, using addition and scalar multiplication of linear transforma- 
tions (with the same scalar field as for Vi, V2). Again, the dual space of a 
vector space is a special case of this. It is not hard to check that the operator 
norm || ■ \\op on the vector space of linear transformations from Vi to V2 is 



indeed a norm in the sense described in Section ^A . 

Suppose that V3 is another vector space, with the same field of scalars as 
for Vi and V2, and that || • II3 is a norm on it. Suppose that in addition to 
T : Vi — ^ V2 we have a linear mapping U : V2 — V3. Then the composition 
f/ T is a linear transformation from Vi to V3, and 



(4.30) \\UT\\op,n < ||f/||op,23||T| 



op,125 



where the subscripts in the operator norms indicate which vector spaces and 
norms on them are being used. This inequality is easy to check. 



4.5 Linear transformations and duals 



Let Vi and V2 be vector spaces (both real or both complex), and let T : 
Vi — > V2 be a linear mapping between them. Associated to T is a canonical 



26 



CHAPTER 4. NORMED VECTOR SPACES 



linear transformation T' : V^* ^ V* called the transpose of T or dual linear 
transformation, and it is defined by 

(4.31) T'{ij)=i2oT for all G K;. 

In other words, if /x is a linear functional on V2, then /loT is a linear functional 
on Vi, and this linear functional is T'{n). 

Sometimes one might call T' the adjoint of T, or denote it T*. We prefer 
not to do that, to avoid confusions with similar but distinct objects in the 
setting of inner product spaces. 

If S" : Vi — i> V2 is another linear transformation, and if a, b are scalars, 
then a S + bT is a linear mapping from Vi to V2 whose dual {aS + bT)' is 
a S' + bT'. If V3 is another vector space with the same field of scalars as 
Vi, V2, and if ?7 : V2 — i> V3 is a linear mapping, then we can consider the 
composition UT -.Vi ^¥3. The dual (f/T)' : V* V{ oi UT is given by 
T' U', as one can easily check. 

The dual of the identity transformation on a vector space V is the identity 
transformation on the dual space V*. A linear mapping T : Vi — > V2 is 
invertible if and only if the dual transformation T' : V2 — > V{ is invertible. 

Now suppose that || • ||i and || ■ II2 are norms on Vi and V2. Associated 
to these are the operator norm || ■ \\op for linear mappings from Vi to V2, and 
the dual norms || ■ ||* and || • II2 on the dual spaces V* and Vg*. Furthermore, 
we can use the dual norms || ■ II2, || ■ ||i to get an operator norm || ■ ||op^. for 
linear mappings from V2 to V{. (This should not be confused with the dual 
of II ■ Hop, viewed as a norm in its own right.) With this notation. 



(4.32) \\T\\^ = \\r\ 



ap* 



for every linear mapping T : Vi ^ V2. 

To see this, one can begin by noticing that ||T'||op* < ||T||op. This is 
not hard to verify directly from the definitions. The opposite inequality can 



be derived using linear functionals as in ( |4.16| ). Another way to look at 
this is to apply the first inequality to T' instead of T, and with Vg* instead 
of Vi, V{ instead of V2, etc. We have already seen that V{* and V2* are 
canonically isomorphic to Vi and V2, respectively, and it is easy to check 
that T" corresponds to T under this isomorphism. The second dual norms 



II • II** and II • II2* correspond to || ■ ||i and || ■ II2, as in Section ^^31 , and 
the operator norm || ■ \\op** associated to the second duals thus reduces to 
II ■ Hop. In this way the first inequality applied to T' instead of T leads to 
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\\T\\op = ll^^'llop** < ||T'||op*, and this together with the first inequahty (as it 
is, apphed to T directly) imphes (|4.32|) . 

4.6 Inner product spaces 

Let \^ be a vector space. An inner product on V is a scalar-valued function 
(■, ■) on V" X with the following properties: (1) for each w & V, the function 
V ^— s> {v,w) is a linear functional on V; (2) if is a real vector space, then 

(4.33) {u!,v) = {v,w) for all v,w &V, 
and if is a complex vector space, then 

(4.34) {w,v) = {v,w) foTei[lv,wEV 

(where a denotes the complex conjugate of the complex number a); (3) the 
inner product is positive definite, in the sense that {v, v) is a positive real 
number (whether V is real or complex) for all f G V such that w 7^ 0. Note 
that )v,w{= whenever either f or w is the zero vector by the first two 
properties of an inner product. 

The first two properties of an inner product imply that for each v & V, 
the function w 1— > {v, w) is linear when is a real vector space, and is 
"conjugate linear" when is a complex vector space (which means that 
complex conjugations are applied to scalars at appropriate moments). 

A vector space equipped with an inner product is called an inner product 
space. The inner product (■, ■) leads to a norm on the vector space, by setting 

(4.35) \\v\\ = {v,vy^\ 

The inner product and norm satisfy the Cauchy-Schwarz inequality, given by 

(4.36) I {v, w) I < It'll \\w\\ 

for all V and w in the vector space. This is well known, and the fact that || ■ || 
satisfies the triangle inequality is normally derived from this. 

For each positive integer n, the standard inner products on R" and C" 
are given by 

n 

(4.37) (x, y) = Y1 Xj yj 

i=i 
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on R" and 



(4.38) {v,w) = Y: 



on C". The associated norms are the standard Euchdean norms on R" and 



C", and are the same as || ■ II2 in Section ^A . 

If (V, (■, ■)) is an inner product space, and if v, w are two vectors in V, 
then V and w are said to be orthogonal if 

(4.39) {v,w) = 0. 

This condition is symmetric in v and w. When v and w are orthogonal, we 
have that 

(4.40) \\v + w\\ = {\\vf + \\wfy^^. 

A collection of vectors is said to be orthogonal if any two vectors in the 
collection are orthogonal. A collection of vectors is said to be orthonormal if 
it is an orthogonal collection of vectors, and all of the vectors have norm 1. 

Any collection of nonzero orthogonal vectors is automatically linearly 
independent (as one can easily verify). A collection of vectors in V is said to 
be an orthogonal basis in V if it is orthogonal and a basis in the usual sense. 
Since nonzero orthogonal vectors are automatically linearly independent, this 
amounts to saying that the collection if orthogonal, and that the vectors in 
the collection are nonzero and span V. Similarly, an orthonormal basis is an 
orthonormal collection of vectors which is also a basis, which is equivalent to 
saying that it is an orthonormal collection which spans V. 

In R" and C" one has the standard bases consisting of the n vectors with 
1 component equal to 1, and the rest equal to 0. These bases are orthonormal 
with respect to the standard inner products. 

Every inner product space admits an orthonormal basis. This famous 
result is often established by starting with any basis of the vector space and 
converting it to one which is orthonormal by applying the Gram-Schmidt 
process. As a consequence, one obtains that every inner product space is iso- 
morphic to R" or C" with the standard inner product, according to whether 
the vector space is real or complex, where n is the dimension of the initial 
vector space. 

Remark 4.41 If {V, (■, ■)) is an inner product space, and if || ■ || is the norm 
associated to (■, ■), then one can give a simple formula for the inner product 
in terms of the norm, through polarization. In particular, the inner product is 
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uniquely determined by the norm, and a linear mapping on V which preserves 
the norm also preserves the inner product. 

For the norm by itself, one has the parallelogram law 

(4.42) ||w + u;f + II?; - tyf = 2 (||t;f + lliuf ) hi al\v,w eV. 

Conversely, if is a vector space and || • || is a norm on V which satisfies 
(|4.42| ), then there is an inner product on V such that || ■ || is the associated 
norm. This is a well-known fact. To establish this, one can start by defining 
{v, w) as a function using the formula for the inner product in terms of the 
norm when the former exists. The point is then to show that this does indeed 
define an inner product if the norm satisfies ( [4.42|) . 

4.7 Inner product spaces, continued 

Let (V, (■, ■)) be an inner product space. For each w E V, one can define a 
linear functional on V by 



The mapping w i— > defines a mapping from V into its dual space V*. 
This mapping is linear when \^ is a real vector space, and it is conjugate- 
linear when \^ is a complex vector space. In either case, this mapping is 
one-to-one and sends V onto V*, as one can check. 

Using the norm || ■ || on associated to the inner product, we get a dual 
norm on V*. The dual norm of is then equal to \\w\\. Indeed, that the dual 
norm is less than or equal to \\w\\ follows from the Cauchy-Schwarz inequality 
( |4.36| ). To get the opposite inequality, one can observe that L^iw) = ||w|p. 

Now let S* be a vector subspace of V. The orthogonal complement S*"*" of 
S is defined by 



This is also a vector subspace of V. 

If S is not all of V, then S""*" contains a nonzero vector. Indeed, if S is not 
all of V, then there is a linear functional on V which is equal to on S* but 
which is not equal to on all of V. This linear functional can be represented 
as Lw for some w E V, and w ^ since the functional is not on all of V. 
On the other hand, w E S'^ because the functional vanishes on S. 



(4.43) 



Lw{v) = {v,w). 



(4.44) 



= {v eV : {v,w) = for all w G V}. 
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The span of 5" and S-^ is equal to all of V. To see this, we can apply 
the previous observation. Namely, the span of 5* and S-^ is automatically 
a subspace of V, and if it is not all of V, then there is a nonzero vector 
w in V such that w is orthogonal to all elements of the span of 5" and S-^. 
In particular, w should be orthogonal to all elements of S, which means 
that w & S^. Since w is also supposed to be orthogonal to all elements of 
S'^, we conclude that w is orthogonal to itself. This implies that w = 0, a 
contradiction. 

The statement that the span of S and is equal to V can be reformu- 
lated as saying that every vector f in can be written as u + z, where u & S 
and z E S^. This decomposition of v is unique, because S H S-^ = {0}. 

We can express this decomposition in the form of a linear mapping Ps : 
V ^ V such that Ps{v) G S and v — Ps{v) G S-^ for all v E V. This is called 
the orthogonal projection of V onto S from V onto S. It is also characterized 
by the property that Ps{v) = v for all f G 5 and Ps{v) = for all v G S-^. 

Another important feature of the orthogonal projection is that it has 
norm 1 in the operator norm associated to the inner product norm on V 
(unless S is the trivial subspace consisting of only the zero vector 0). In 
other words, 

(4.45) 11^5(^^)11 < ll^^ll forallt;GV, 



and equality holds when v E S. This inequality follows easily from ( [4.40|) 
in Section [4.6[ Conversely, a projection on an inner product space is an 
orthogonal projection if it has norm 1. See Lemma |6.56| in Section |6l6. 



4.8 Separation of convex sets 

Let E he a nonempty closed convex subset of R" for some positive integer 
n, and let j9 be a point in R" which does not lie in E. A famous separation 
theorem states that there is a hyperplane H in R" such that E lies on one 
side of H and p lies on the other. 

This theorem can be proved in the following manner. First, let q be an 
element of E such that the distance from p to g in the usual Euclidean norm 
is as small as possible. The existence of q follows from standard consider- 
ations of continuity and compactness. Although E may not be compact, 
because it may not be bounded, only a bounded subset of E is needed for 
this minimization (namely, a part of E which is not too far from p) . 
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Of course p ^ q, because p does not lie in E. Consider the hyperplane 
Hq in R" which passes through q and which is orthogonal to p — q. It is not 
too hard to show that E lies in one of the closed half-spaces bounded by Hq 
(the one that does not contain p). (Exercise.) To get a hyperplane H which 
strictly separates p from E, one can slide Hq over towards p. 

Now consider a slightly different situation, where p is an element of E, 
but p does not lie in the interior of E. In other words, p is in the boundary 
of E. In this case a related result is that there is a hyperplane H in R" 
which passes through p and for which E is contained in one of the closed 
half-spaces bounded by H. 

If E has smooth boundary, then H should exactly be taken to be the 
tangent hyperplane to E at p. In general, however, E can have corners at 
the boundary. 

At any rate, one can deal with this case using the previous assertion. 
Specifically, because p does not lie in the interior of E, there are points pi in 
R" which are not in E and which are as close as one likes to p. This leads 
to hyperplanes that pass near p and for which E is contained in one of the 
corresponding closed half-spaces. To get a hyperplane that passes through p, 
one can take a limit of preliminary hyperplanes of this type. More precisely, 
one can start with a sequence of these preliminary hyperplanes for which 
the associated points pi converge to p, and then use the compactness of the 
unit sphere in R"^ to pass to a subsequence of hyperplanes with unit normal 
vectors converging to a unit vector in R". The hyperplane in R" that passes 
through p and has this limiting vector as a normal vector also has E contained 
in one of the closed half-spaces of which it is the boundary, as one can verify. 

These separation properties can be rephrased in terms of linear function- 
als on R". Indeed, given any hyperplane H in R", there is a nonzero linear 
functional A on R" and a real number c such that 



The separation properties of E and p can be described in terms of the values 
of A on £^ and at p. 

One can also look at separations of pairs of convex sets, rather than a 
convex set and a single point. Specifically, let Ei and E2 be disjoint nonempty 
convex subsets of R". A useful trick is to consider the set of differences 



(4.46) 



H^{xeW: X{x) = c}. 



El — E'^ 
(4.47) 



'2, 



El- E2 = {x -y: X e Ei,y e E2}. 
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It is easy to see that E1—E2 is also convex, and that does not he in E1 — E2, 
since Ei and E2 are disjoint. However, Ei — E2 is not closed in general, even 
if El and E2 are. (This does hold if one of Ei, E2 is compact and the other 
is closed.) 

Let F denote the closure of E1 — E2. Thus F is a nonempty closed convex 
subset of R". Because does not lie in Ei — E2, one can verify that does 
not lie in the interior of F, i.e., it cither docs not lie in F at all, or it lies 
in the boundary of F. Hence there is a hyperplane H that passes through 
such that F is contained in one of the two closed half-spaces bounded by 
H, as before. This is equivalent to saying that there is a linear functional A 
on R'* which is nonnegative everywhere on Ei — E2. As a consequence, the 
supremum of A on E2 is bounded by the infimum of A on 

The first result mentioned in this section has a simple and well-known 
corollary, which is that a set E in R" is closed and convex if and only if it 
can be reahzed as the intersection of a family of closed half-spaces in R". The 
"if" part of this assertion is automatic, since closed half-spaces are closed and 
convex, and these properties are preserved by intersections. For the converse, 
if £■ is a closed and convex subset of R", then one can compare E to the 
intersection of all closed half-spaces in R" that contain E. This intersection 
contains E by definition, and it contains no other element of R" when E is 
closed and convex because of the separation theorem. 

4.9 Some variations 

In this section, V will be a finite-dimensional real vector space. 

Definition 4.48 (Sublinear functions) A real-valued function p{v) on V 
is said to be sublinear if p{v + w) < p{v) + p{w) for all v,w E V and 
pit v) = tp{v) for allv G V and all nonnegative real numbers t. In particular, 
p{0) = 0. 

Unlike a norm, a sublinear function p{v) is not asked to satisfy p{—v) — 
p{v). 

Theorem 4.49 Let V be a finite- dimensional real vector space, and let p be 
a sublinear function on V. Suppose that W is a vector subspace of V and 
that n is a linear functional on W such that 



(4.50) 



li{w) < p{w) for all w eW. 
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Then there is a linear functional Jl on V which is an extension of n and 
satisfies 

(4.51) ) < p{v) for all v G V. 



Note that we simply have and ) in ( |4.50| ) and ( [4.51| ), rather than 



their absolute values. If p{—v) = p{v) for all v ^ V , then this would not 
make a real difference. 

Theorem [4.49| can be proved in a similar manner as Theorem |4.17] in 



Section The first part of the proof is essentially the same except for 
simple changes following the differences in the statements of the theorems. 
This works up to ( [4. 211 ), whose counterpart in the present setting is 

(4.52) ^j+i{u + tz) <p{u + tz) for all M G ly,- and t G R. 

To be more precise, for the proof we want to choose the real number a (which 
Hjj^i{z) is assigned to be) so that ( [4.52| ) holds, and if we can do that, then 
the theorem will follow. 

For the next step we need a modification, which is to say that ( [4.52|) can 
be reduced to 

(4.53) fJ'j+i{u + z) < p{u + z) and /ij+i(u — z) < p{u — z) 
for all u eWj. 



As before, the t = case of ( 4.52| ) comes from the induction hypothesis (for 



/ij and Wj), and for t 7^ one can convert ( |4.52| ) into ( [4.53| ) by employing 



the homogeneity of the sublinear function p{v). Now we have to treat t > 
and t < separately, since we are only assuming homogeneity for positive 
real numbers. 

We can rewrite ( |4.53| ) as 



(4.54) f^j{u) + a < p{u + z) and /ij(M) — a < p{u — z) 



for all u eWj, 



using the definition of /Xj+i, as in the previous situation. The problem is to 
verify that a G R can be chosen so that this condition holds. 
This requirement can be rephrased as 

(4.55) fJ'j{u) — p{u — z) < a < p{u + z) — ftj{u) for all u EWj. 
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The rest of the argument is nearly the same as before. To show that there is 
an a G R for which ( 4.55 ) is satisfied, it suffices to verify that 

(4.56) /ij(ui) — p{ui — z) < p{u2 + z) — Hj{u2) for all ui,U2 G Wj. 
This is equivalent to saying that 

(4.57) fJ^jiui + U2) < p{ui — z) + p{u2 + z) for all mi, ^2 £ Wj. 
The subadditivity property of p{v) shows that this condition holds if 

(4.58) IJi'jiui + U2) < p{ui + U2) for all Ui,U2 G Wj, 
which is the same as 

(4.59) ^ij{u) < p{u) for all u eWj. 

This is exactly part of the "induction hypothesis" for Wj and /ij in this 
setting, i.e., the counterpart of ([4.50|) for Wj and fij. Thus ([4.56|) is valid, 
and it is possible to choose a G R with the desired feature. This completes 
the proof of Theorem |4.49. 



Now let us turn to convex cones. A subset C of is called a convex cone 

if 

(4.60) X + y G C whenever x,y & C 
and 

(4.61) tx eC whenever a; G C, t > 0. 

In the second condition, t is a real number. 

Clearly convex cones are convex sets. Conversely, C is a convex cone if it 
is convex and satisfies ( |4.61| ). 

A subset C of \^ is called an open convex cone, or a closed convex cone, 
if it is a convex cone and if it is open or closed, respectively, as a subset of 
V . For the latter we use a linear isomorphism between V and R", where n is 
the dimension of V , to define open and closed sets in V . Which isomorphism 
ones employs does not matter, because invertible linear mappings on R" are 
homeomorphisms. 

For example, if C is the set of x = (xi,...,x„) in R" such that the 
coordinates of x are all positive, then C is an open convex cone in R". If 
C is the set of x G R" such that the coordinates Xj are all nonnegative, then 
C is a closed convex cone. 



4.9. SOME VARIATIONS 



35 



Here is a simple recipe for producing convex cones. If Co is a convex set 
in the vector space V, then consider the set 

(4.62) {tx:t>0, xe Cq}. 

One can check that this is a convex cone in V, and in fact it is an open 
convex cone if Cq is an open convex subset of V. If Cq is a compact convex 
subset of V which does not contain 0, then 

(4.63) {tx:t>0, xe Co} U {0} 

is a closed convex cone in V. 

If P is an affine plane in V that does not pass through the origin, and Cq is 
contained in P, then ( |4.62| ) and ( |4.63| ) define convex cones whose intersection 
with P is exactly Cq. 

To avoid degeneracies, one sometimes asks that a convex cone CinV not 
be contained in a proper vector subspace of V. One also sometimes asks that 
a convex cone C not contain any lines. Let us make the standing convention 
that 

(4.64) if C is a closed convex cone, then G C. 

In other words, a closed convex cone should not be the empty set. 

Now suppose that a norm || • || on has been chosen. If C is a nonempty 
convex cone in V, define p{v) on V by 

(4.65) p{v) = inf{||f — x\\ : x E C}. 

In other words, p{v) is the distance from f to C relative to the norm || • ||. It 
is not hard to verify that p{v) is a sublinear function when C is a nonempty 
convex cone. Note that the closure of C is a closed convex cone which leads 
to the same function p{v). 

If C is a convex cone in V, then we can associate to it a subset C* of 
the dual space V* of V, namely, the set of linear functionals on V which are 
nonnegative on C. Actually, this makes sense for any subset C of V, but if 
C is not already a convex cone, then one can take the convex cone that it 
generates, and this convex cone will define the same set in V*. Similarly, the 
closure of C leads to the same set in the dual, and adding to C if it is not 
already an element of C does not affect C*. 

It is not hard to check that C* is always a closed convex cone in V*, 
containing in particular. This is called the dual cone to C. 
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If one takes C to be the degenerate cone consisting of only 0, then the 
dual cone is all of V*. Similarly, if C is all of V, then the dual cone consists 
of only the zero linear functional. 

Suppose that V = R", and for C let us consider the closed convex cone 
of points X all of whose coordinates are nonnegative. A linear function A on 

can be given as 

n 

(4.66) X{x) =^aiXi 

i=l 

for some real numbers ai, . . . ,a„, and this linear functional is nonnegative 
on C exactly when Oj > for each i. 

Let C be a closed convex cone in V, and let C* be the dual cone in V*. 
Associated to it is the second dual cone C** in V**. As in Section ^73| , we 
can identify V** with in a simple way, where each element of V defines 
a linear functional on V* by evaluation. It is easy to see that C C C**. In 
fact, 

(4.67) C = C**. 

To verify this, one wants to show that an element v of V does lie in C if 
every linear functional in C* is nonnegative at v. In other words, if v E V 
does not lie in C, then one would like to say that there is a linear functional 
on V which is nonnegative on C, but negative at v. This can be established 
as in Section [4.8| , or using Theorem [4.49| . For the latter, one starts with a 



linear functional on the 1-dimensional subspace spanned by v, and one uses a 
sublinear function p associated to C as above. There is also a multiplication 
by —1 involved, for switching between nonnegativity and nonpositivity on 
C. 

See Chapter III of [^teW2|| for some very interesting topics concerning 



convexity, duality, Fourier transforms, and holomorphic functions on certain 
regions in C". 



Chapter 5 
Strict convexity 



5.1 Functions of one real variable 

Let / be a nonempty subset of the real line R which is either an open interval, 
an open half-line, or the whole real line. A real-valued function on I is 
said to be strictly convex if 

(5.1) 0(A x + {l-\)y)<\ <i){x) + (1 - A) (/.(y) 

for all x,y E I such that x ^ y and all A G (0,1). Thus strictly convex 
functions are convex in particular. 

Lemma 5.2 A real-valued function (p on I is strictly convex if and only if 
for every point t & I there is a real-valued affine function A{x) on R such 
that A{t) = (pit) and A{x) < 0(s) for all x G I\{t}. 

This is the analogue of Lemma |3.9| in Section |3.1| for strictly convex func- 
tions. 



To prove the "if" part of Lemma |5.2| , let x,y G I and A G (0, 1) be given 



and set t = Ax + (1 — A) ?/. Then t lies in J, and by hypothesis there is a 
real-valued affine function A{z) on R such that A{t) = (f){t) and A{z) < (f){z) 
for all z G /, z 7^ t. If x 7^ y, then x ^ t and y t, so that A{x) < 0(x) and 
A{y) < (p{y). Because A{z) is affine, we have that 

(5.3) A{t) = XAix) + il-X)Aiy), 
and hence 

(5.4) (Pit) = Ait) < A (Pix) + (1 - A) (Piy). 
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This is exactly the inequahty that we want. 

Conversely, suppose that is a strictly convex function on /, and let t be 
an element of I. By Lemma we know that there is a real-valued affine 
function A{x) on R such that A{t) = (f){t) and A{x) < (t>{x) for all a; G /. 
We want to show that A{x) < (j){x) when x E I ^ x ^ t. 

Suppose to the contrary that = A{x) for some x (z I ^ x ^ t. For 

each A G (0, 1), we have that 

(5.5) A(At+(l-A)x) <0(At+(l-A)x) 
by the conditions on Aiy), and 

(5.6) A{\ t + (l-\)x)=\ A{t) + (1 - A) A{x), 

since A{y) is affine. Because A{t) = (j){t) and A{x) = 0(x), we obtain that 

(5.7) A (j){t) + (1 - A) (j){x) <(j){Xt+{l- A) x). 

This contradicts the strict convexity of 0, and the lemma follows. 

As a basic family of examples (from "calculus"), the functions t ^ on 
(0, cx)) are strictly convex when p > 1. One might as well say that these 
functions are strictly convex on [0, oo), even if this is not an open half- line, 
because they satisfy similar properties there (in terms of ( p.l[ ) and comparison 
with affine functions). 



5.2 The unit ball in a normed vector space 



Let {V, II ■ II) be a normed vector space (real or complex). As in Section ^TL 
define the unit ball in V to be the set 

(5.8) Bi = {v eV : \\v\\ < 1}. 

To be more precise, this is sometimes called the "closed unit ball" , to distin- 
guish it from the open unit ball, which is defined in the same manner except 
that the inequality < is replaced with <. 

Let us say that the unit ball Bi in a normed vector space {V, \\ ■ ||) is 
strictly convex if for every x,y eV such that ||x|| = \\y\\ = 1 and x ^ y and 
every real number A G (0, 1) we have that 

(5.9) ||Ax + (1 - A)?/|| < 1. 
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As basic examples, one can start with the fields of scalars R and C, and 
their standard norms. The unit balls for these are strictly convex in this 
sense. More generally, the unit ball in any inner product space is strictly 
convex. This is not difficult to compute, using the standard analysis of the 
case of equality for the Cauchy-Schwarz inequality. 

Now suppose that V is R" or C", n > 2, equipped with a norm || ■ ||p as in 
Section [4.1|, 1 < p < oo. Ifp=lorp = cx3, then it is not hard to show that 



the unit ball is not strictly convex. However, if 1 < p < cx), then the unit ball 
is strictly convex. This assertion can be derived from the strict convexity of 
the function t ^ on [0, oo), mentioned in the previous section. For this it 
is convenient to reformulate the strict convexity of the unit ball as follows: 
for each x and y in R" or C" such that 

n n 

(5.10) EKr = Eb.r = i 

i=i i=i 
and X y, and for each A G (0, 1), we have that 

n 

(5.11) 5:|Ax, + (l-A)%r<l. 



5.3 Linear functionals 

As in earlier chapters, we shall make the standing assumption that vector 
spaces have finite dimension in this section. 

Proposition 5.12 Let (V, || ■ ||) be a normed vector space. The unit hall Bi 
ofV is strictly convex if and only if for every nonzero linear functional fi on 
V , there is a unique point x G -Bi such that 



(5.13) fj,{x) = sup{/i(z) : z G Bi} 
when V is a real vector space, and 

(5.14) Re n{x) = sup{Re fi{z) : z e Bi} 

when V is a complex vector space. Here Re a denotes the real part of a 
complex number a. 
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In other words, the last part says that there is a unique point in Bi at 
which /i or Re/x attains its maximum. We can in fact rewrite (|5.13|) and 
CT ) as 

(5.15) /i(x) = 

for both real and complex vector spaces, where ||/i||* denotes the dual norm 
of fi (as in Section [4.21 ). In other words, fi{z) or Refi{z) should be equal to 
|/i(2;)| in order to be as large as possible, since they are always less than or 
equal to |At(z)|, and equality can be arranged on Bi by multiplying z by a 
scalar of magnitude 1. 

Every finite-dimensional vector space is linearly isomorphic to a Euclidean 
space, and this permits one to apply standard results on Euclidean spaces 
pertaining to continuity, compactness, and so on. It also does not matter 
which linear isomorphism one uses for this, i.e., the results would be the 
same. The general fact that continuous real- valued functions on compact 
sets attain their maxima can be employed to conclude that for any normed 
vector space (V, || ■ ||), every linear functional on V attains its maximum on 
the unit ball Bi in V. More precisely, linear functionals and their real parts 
are continuous with respect to this topology on V, coming from the usual 
one on a Euclidean space, and the ball Bi is compact because it is closed 



and bounded. The comments near the end of Section ffTTI are relevant for the 
latter. 

Proposition 5.12 is concerned with the uniqueness of points at which a 
maximum is attained. Let us begin with the "only if" part. Suppose that Bi 
is strictly convex, and that /i is a nonzero linear functional on V. Assume, 
for the sake of a contradiction, that there are two distinct points x,y & Bi 
at which the maximum is attained. As above, this means that 

(5.16) ^(x) = fi{y) = 

(whether \^ is a real or complex vector space). We also have that ||x|| = 
\\y\\ = 1, since 

(5.17) Ia^('^)I ^ ll/^ir IK'll < IIa^II* when \\w\\ < 1. 

For each real number t G (0, 1), consider tx + {l — t)y&V. We have that 

(5.18) \\tx + {l-t)y\\<l 

by the usual properties of the norm. On the other hand, 

(5.19) fi{tx + {l~t)y) = \\fi\\\ 
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because of ( |5.16| ). Thus 

(5.20) \\tx+{l~t)y\\ = l, 

as in (I5.171 ). This contradicts the strict convexity of Bi. Therefore we obtain 
that there is only one point in Bi at which the maximum is attained when 
Bi is strictly convex. 

For the "if" part of the proposition, let us assume for the sake of a con- 
tradiction that there are points x,y & V such that ||x|| = \\y\\ = 1, x y, 
and 

(5.21) \\tx+{l-t)y\\ = l 

for some real number t G (0, 1). Set u = tx + {1 — t) y. 

As in ( [4.16| ) in Section [4.3| , there is a nonzero linear functional fi on V 
such that fi{u) = Consider /i(x), /i(?/). On the one hand, 

(5.22) i2{u) = /i(tx+ (1 -t)y) = t/i(x) + (1 -t)n{y). 
On the other hand, 

(5.23) |Ma:)|,|/i(l/)|<||/i|r, 
since ||x|| = \\y\\ = 1. It follows that 

(5.24) ^(x)= fiiy) = \\fx\\*. 

Thus we have more than one point in Bi at which the maximum is attained. 
This completes the proof of Proposition ^.12 . 



Proposition 5.25 Let (V, || ■ ||) be a normed vector space, and let || ■ ||* he 

the dual norm of || ■ || on the dual space V* . The unit ball 5* ofV* is strictly 
convex if and only if for every point x & V with \\x\\ = 1 there is a unique 
linear functional ^ on V such that \\fi\\* = 1 and /i(x) = 1. 

Here we are interchanging the roles of x and fi, compared to the situation 



in Proposition |5.12| . As before, it is the uniqueness of which is at issue, 
since ([4.16|) in Section ^]3| implies that there is a /i G V^* such that = 1 
and /i(a;) = 1 for a given x & V with ||x|| = 1. 



One can derive Proposition |5.25| from Proposition |5.12| by applying the 



latter to {V*, \\ ■ ||*) instead of {V, \\ ■ ||). The second dual of V is now in 
the position of the dual space before, and we know from Section 0| that the 



second dual of V is equivalent to V, both as a vector space and for the norm. 
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5.4 Uniqueness of points of minimal distance 



Let us mention a variant of the "only if" part of Proposition 5.12 



Proposition 5.26 Suppose that {V, \\ ■ ||) is a (finite- dimensional) normed 
vector space in which the unit hall Bi is strictly convex. Let A he a closed 
convex suhset of V (where "closed" can he defined in terms of a linear iso- 
morphism with a Euclidean space, as hefore), and let w he any element ofV. 
Then there is a unique point z ^ A such that 

(5.27) \\w - z\\ = mf{\\w - u\\ : u e A}. 

For instance, A could be a vector subspace of V. 

Let A, w, etc., be given as above. The existence of z E A which satis- 
fies (|5.27 ) follows from general considerations of continuity and compactness 



again. More precisely, A may not be bounded, and hence not compact, but 
one can reduce to that situation because only a bounded subset of A is needed 
for the infimum. In other words, points which are too far away will not have a 
chance of minimizing the distance to w. Strictly speaking, this uses remarks 



in Section |0| , as does the continuity of \\w — u\\ as a function of u. For this 
part of the argument one does not need the assumption that the unit ball Bi 
in V is strictly convex, nor does one need the convexity of A. 

Suppose for the sake of a contradiction that there is point y E A, y z, 
which also satisfies 

(5.28) \\w - y\\ = ini{\\w - u\\ : u e A}. 

In particular, \\w — y\\ = \\w — z\\. Strict convexity of the unit ball Bi in 
V implies that any nontrivial convex combination of w — z and w — y has 
norm strictly less than — z|| = ||i(7 — ?/||. On the other hand, a convex 
combination of w — z and w — y can be written as w — p, where p is a convex 
combination of z and y. This point p lies in A, because A is convex. Hence 
\\w — p\\ > \\w — z\\, since z was chosen so that \\w — z\\ is minimal. This 
contradicts the previous assertion to the effect that \\w — p\\ < \\w — z\\, and 



Proposition p.26| follows. 



Remark 5.29 Assume that (V, (-, ■)) is a (finite-dimensional) inner product 
space and || ■ || is the norm that comes from the inner product. Let S be 



a vector subspace of V, and let w be any element of V. As in Section 4.7, 
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there is a point z G S such that w — z G S , i.e., w — z is orthogonal to 
every element of S. This point z is exactly the one for which ||w — z|| is 
minimal. Indeed, if u is any other element of S, then we can write w — u as 
{w — z) + {z — u), and w — z and z — u are orthogonal by the choice of z. 
Hence 

(5.30) \\w - m|| = {\\w - zf + \\z - wf )i/2 = \\w - z\\. 

5.5 Clarkson's inequalities 

Let n be a positive integer, and p a real number such that 1 < p < oo. 
Consider C" equipped with the norm || ■ \\p defined in (|4.1|) . If p > 2, then 

(5.31) Wlix + yWp+ Uix -y)rp< '^\\x\\; + 

for all x,y E C". If p < 2, and if p' is the exponent conjugate to p (so that 
1/p + 1/p' = 1), then 

(5.32) Wlix + y)\\i + Wlix - y)\\i < {l\\x\\; + UvK)^ 

for all x,y E C". These are Clarkson's inequalities. See (15.5) on p225 and 
(15.8) on p227 of [gew^]. 

Thus, for instance, if = \\y\\p = 1, then one gets an upper bound for 
||x — y\\p in terms of how close ||(x + y)/2\\p is to 1. 



Chapter 6 
Spectral theory 



In this chapter, vector spaces are assumed to be finite-dimensional. In Sec- 
tions - lOI , vector spaces are assumed to be complex as well, while real 
and complex vector spaces are considered in Sections |6.4| - |6.8| . 



6.1 The spectrum and spectral radius 

Let \^ be a vector space, and let T be a linear operator from to V^. The 
spectrum is the set of complex numbers a such that a is an eigenvalue of T, 
i.e., there is a nonzero vector f in such that T{v) = av. In this case one 
says that v is an eigenvector of T with eigenvalue a. If a is an eigenvalue of 
T, then 

(6.1) {v eV -.Tiv) = av} 

is called the eigenspace associated to the eigenvalue a, and it is a vector 
subspace of V. 

Equivalently, a is not in the spectrum of T exactly when T — a/ is an 
invertible linear operator on V, where / denotes the identity transformation 
on V. This uses the fact that a linear operator on V is invertible if and only 
if its kernel is trivial. 

These notions also make sense for linear transformations on a real vector 
space, but the next result is not true in general in that case. There are 



substitutes for this, a basic instance of which is mentioned in Section 3.4 



Theorem 6.2 If V is a vector space of nonzero dimension and T is a linear 
operator on V, then the spectrum of T is nonempty. 
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To prove this, one can use the following characterization of the spectrum 
of T: a lies in the spectrum of T exactly when the determinant of T — al is 0. 
On the other hand, the determinant of T — aJ is a polynomial of a, and the 
degree of the polynomial is equal to the dimension of V. The "Fundamental 
Theorem of Algebra" states that for every nonconstant polynomial P{z) on 
the complex numbers there is at least one a G C such that P{a) = 0. (See 



Rudlj for a proof.) Thus det(T — a/) has at least one root, and the theorem 
follows. 

Note that the number of elements of the spectrum of T is at most the 
dimension of V. This can also be seen as a consequence of the fact that the 
spectrum consists of the zeros of the polynomial det(T — al), or one can 
observe that nonzero eigenvectors of T corresponding to distinct eigenvalues 
are automatically linearly independent. 

Given a hnear operator T on a vector space V, define the spectral radius 
Rad(T) by 

(6.3) Rad(T) = max{a G C : a lies in the spectrum of T}. 
This is equivalent to 

(6.4) Rad(r) = 

inf{r > : T — a/ is invertible for all a G C such that \a\ > r}. 



6.2 Spectral radius and norms 

Let \^ be a vector space, and let || ■ || be a norm on V. If T is a linear 
transformation on V, then we can define the operator norm || T || op of T using 



on V, as in Section 4.4. More precisely, we are using || ■ || as the norm 



on V in the role of both || ■ ||i and || ■ II2 in the context of Section [4.4| , i.e., for 
both the domain and range. 



Lemma 6.5 Rad(T) < ||T| 



op ■ 



If we think of Rad(T) as being given by (|6.3| ), then we can establish the 
lemma as follows. Let a be an eigenvalue of T, and let f G be a nonzero 
eigenvector corresponding to a. Thus T{v) = av, and from this it is easy to 
see that |a| < ||T||op. The lemma follows easily from this. 

One can also approach the lemma through the following. 
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Lemma 6.6 If A : V ^ V is a linear mapping such that \\A\\op < 1, th 
I — A is invertible, and 



(6.7) - ArXp < {1 - ,..,opj 

In the context of Lemma |6.5|, one can apply Lemma with A = a^^T 



when a G C, |a| > \\T\\op (so that ||v4||op < 1), to get that / — a^^T is 
invertible. This is the same as saying that a/ — T is invertible when \a\ > 
\\T\\op, so that Rad(T) < \\T\\op by (|67 



A famous technique for deriving Lemma |6.6| is the method of "Neumann 
series" . Specifically, one would like to obtain the inverse to / — A by summing 
the series 

oo 

(6.8) y: 

n=0 

This series converges "absolutely" because 



(6.9) ll^"l|op< 



n 

op — IM '■Wopi 



as in (|4.3CI|) in Section and because ||v4|| op < 1 by hypothesis. By a 



standard computation the sum of the series is the inverse oi I — A. The 
norm of the sum is less than or equal to the sum of the norms, and the latter 
is bounded by a geometric series whose sum is (1 — ||y4||op)~^. 

Alternatively, one can first assert that I — A is invertible because its kernel 
is trivial, which amounts to the same argument as the initial one for Lemma 
31. The inequality (|6.7|) is equivalent to the statement that 



(6.10) Wil - Ay\v)\\ < (1 - \\A\\opy^ \\v\\ foiaWveV. 
This is in turn equivalent to saying that 

(6.11) \\w\\ < {1 - WAWopy^ - A){w)\\ forallwGV. 
To get this, one can compute as follows: 

(6.12) < \\{I~A){w)\\ + \\A{w)\\ < \\{I~A){w)\\ + \\A\U\w\\. 



6.3 Spectral radius and norms, 2 

Let V he a vector space with norm || . || , and let \\-\\op denote the corresponding 
operator norm for linear transformations on V as in the previous section. 



6.3. SPECTRAL RADIUS AND NORMS, 2 
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Lemma 6.13 If T is a linear transformation on V and n is a positive inte- 
ger, then Rad(T) < ||T"||^/". 

Note that is always less than or equal to ||T||op, because of ( |4.30| ) 



in Section 4.4 



Let us mention a couple of arguments for Lemma For the first, let 



a be any eigenvalue of T. Then a" is an eigenvalue of T", with the same 
eigenvector. Hence |q;|" < Rad(T"). Applying Lemma |6.5| to T" instead of 
T we obtain that < ||T"||op, which is the same as |a| < HT"!!^/". Since 
this holds for any eigenvalue of T, we may conclude that Rad(T) < ||T"||^/". 

Alternatively, one can observe that a"J — T" is equal to the product of 
al — T and another operator (just as x" — can be written as the product of 
x — y and another algebraic expression). This implies that aJ — T is invertible 



whenever a'^I — T" is, and one can employ the characterization (|6.4| ) of the 
radius of convergence. 

Proposition 6.14 If T is a linear transformation on the vector space V, 
then the limit 

(6.15) lim ||r"||V" 

exists and is equal to Rad(T). 

This may seem a bit odd at first, since the spectral radius Rad(T) does 
not depend on the norm on V, while (|6.15| ) does involve the norm, but it 
is not hard to show that ( |6.15|) in fact also does not depend on the norm. 



Specifically, any two norms on a finite-dimensional vector space are bounded 
by positive constant multiples of each other, as in Section |4.1|, and this leads 



to a similar relation for the operator norms. The effects of these constants 
washes out in (|6.15| ), since lim^^oo = 1 for any positive real number A 
(as in ||Rudl|| ). 

One can use this type of remark together with Jordan canonical forms to 
prove the proposition. Let us now describe another argument. 

If T is as in the proposition, then {aI — T)~^ exists for all a e C such that 
|a| > Rad(T). It will be convenient to reformulate this as saying that (/ — 
[3T)~^ exists for all /? G C such that < Rad(T)~-'^. This includes /? = 0, for 
which invertibility is trivial. If Rad(T) = 0, then we take Rad(T)~^ = +oo, 
and (/ - I3T)-^ exists for all /5 G C. 

In fact, (/ — I3T)~^ is a rational function on C with poles lying outside 
the disk {/5 G C : |/?| < Rad(r)-i}. More precisely, it is a rational function 
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with values in the vector space of hnear mappings on V, but this is not too 
serious, since the vector space of hnear mappings on V has finite dimension 
(since V is finite-dimensional). 

Recall that a complex-valued function of a complex variable z is said to 
be rational if it can be written as P{z)/Q{z), where P{z) and Q{z) are poly- 
nomials in z (with complex coefficients) and Q{z) is not identically 0. The 
points at which Q{z) vanishes are called the poles of the rational function, 
except to the extent that they can be cancelled with roots of z (in such a 
way that the function can be rewritten as a quotient of polynomials in which 
the denominator does not vanish at the point in question). 

To say that (/ — (3T)~^ is a rational function of /3 means that it can 
be written as a finite sum of functions which are each a complex-valued 
rational function of (3 times a fixed linear mapping on V. This is true in this 
case because of Cramer's rule, which allows (/ — l3T)~^ to be expressed as 
det(/ — f3T)~^ times a linear transformation on V defined in terms of minors 
of I — (3T. These minors are polynomials in /3, as is det(/ — /5T) (which is also 
not identically since it is equal to 1 at /5 = 0). The poles of this rational 
function come from the zeros of det(/ — jST), and they lie outside of the disk 
{/9 e C : < Rad(T)-^} by assumption. 

Because the poles of this rational function lie outside of this disk, this 
rational function of (3 is equal to a convergent power series in /3 on this disk 
(centered at 0). This works for any rational function, and more generally for 
any complex-analytic function on a disk. 

The form of the power series for (/ — jST)^^ about the origin is clear, and 
is given by J2'i^=o P"' T"' . We conclude that this series converges for all /3 G C 
such that \j3\ < Rad(T)^^. This implies that the coefficients of the series do 
not grow too fast in a certain sense. This works in a general way, as discussed 
Rudl|; for rational functions, one has even more information about the 



m 



coefficients, but for the moment it is only the approximate size that is needed. 
An adequate bottom line is that for every real number r > Rad(T), there is 
a positive constant L (which may depend on r) such that 

(6.16) \\T^op<Lr^ 

for all n > 0. 

We can rewrite this inequality as 

(6.17) ||T"||^/" < L^/"r. 
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Using this and Lemma |6.13| , it is not hard to show that ( |6.15| ) exists and is 



equal to Rad(T). Of course it is important too that r above can be taken to 
be any real number larger than Rad(T). 

In the preceding discussion of upper bounds for ||T"||op, we are not using 
special features of the operator norm || ■ \\op for linear transformations on V. 
In effect, this is accommodated by the constant L. One can think in terms 
of starting with computations for the complex-valued setting; one can deal 
with different scalar components separately, and then combine them to get 
results for linear transformations. The final packaging in terms of || ■ \\op, 
instead of some other norm, only affects the constant L. 

Remark 6.18 If a„ = ||T"||^/", then 

(6.19) a^+„ < a:^/('"+") 

for all positive integers m and n. This follows from the fact that the operator 
norm of a product of operators is less than or equal to the product of the 
operator norms. (Note that this applies equally well to operators on real or 
complex normed vector spaces.) 

Now suppose that {an}'^=i is simply any sequence of nonnegative real 
numbers such that ( |6.19D holds for all positive integers m and n. This is a 



multiplicative convexity property; if a„ > for all n, then we can rewrite it 
as 

(6.20) loga„<^loga„ + ;;^loga„, 

as in ordinary "additive" convexity. (If an = for some n, then aj = for all 
j > n, and one might as well not worry about these a/s too much anyway.) 

A well-known result states that lim„^oo automatically exists under this 
convexity condition. To show this, it is enough to verify that lim sup^^g^ a„ < 
liminf^^oo c^n- Let us first observe that 

(6.21) ajn < an 

for all positive integers j and n. This can be derived from ( |6.19| ) using 
induction. Next, if 1 < z < ra, then 

(6.22) a^n+^ < a^^'""^'^ af'""^'^ < /(^■"+^) a^^^'''+'\ 



Once one has this, it is not difficult to finish the argument. 
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6.4 Inner product spaces 

In this section, vector spaces are assumed to be finite-dimensional, but both 
real and complex vector spaces are allowed. 

Let (y, (■, •)) be an inner product space, as in Section real or complex. 
If T : — s> is a linear transformation, then there is a unique linear 
transformation T* : V ^ V such that 

(6.23) {T{v),w) = {v,T*{w)) 

for all v,w eV . This is called the adjoint of T, and when is a real vector 
space, T* is sometimes called the transpose of T. Specifically, if we fix an 
orthonormal basis for V , and express T in terms of a matrix relative to this 
orthonormal basis, then T* corresponds to the transpose of the matrix for T 
when y is a real vector space, and T* corresponds to the conjugate transpose 
of the matrix for T in the complex case. (Recall that if {tj,fc} is a matrix, 
then the matrix is the transpose of {tj,k} if I and m run through the 

same ranges of indices for both si^m and tm,h and if si^m = tm,i for all / and m. 
The matrix {ui^m} is the conjugate transpose of {tj^k} if ^ and m run through 
the same ranges of indices for both ui^m and tm,i, and if ui^m = tm,i for all / 
and m, where a denotes the complex conjugate of a.) It is easy to see that 
T* is uniquely determined by ( |6.23| ), and in particular different orthonormal 
bases for V lead to the same linear transformation T* when one computes 
T* in terms of matrices. 

Let II ■ II be the norm on V associated to the inner product (■, ■), and let 
II • Hop denote the operator norm for linear transformations on V defined using 
II ■ ||. It is not hard to check that 

(6.24) \\T\\op = s\vp{\{T{y),w)\:v,w eV, \\v\\ = \\w\\ = 1} 

for any linear transformation T on V . (This uses the Cauchy-Schwarz in- 
equality ( [4.36|) to show that ||T||op is greater than or equal to the right side. 



and the choice of w equal to a scalar multiple of T(t>) (unless T{v) = 0) for 
the other inequality.) One also has that 

(6.25) IIT' Hop = lirilop, 

as can easily be derived from (|6.24|) . 



Note that /* = /, where / denotes the identity transformation. If S and 
T are linear transformations on V and a, h are scalars, then 



(6.26) 



{aS + hTY = aS* + hT* 
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when y is a real vector space and 

(6.27) {aS + bTy = aS* + bT* 
when y is a complex vector space. Also, 

(6.28) (T*)* = T 
and 

(6.29) {STy = T*S* 

(in both cases) . These properties can be verified in a straightforward manner. 
From (|6.29| ) it follows that if T is invertible, then T* is invertible as well, 



and 

(6.30) (T*)-^ = {T-^y. 

It follows that T — A / is invertible if and only if T* — A / is invertible in the 
real case, and if and only if T* — A / is invertible in the complex case. Thus 
A is in the spectrum of T if and only if A is in the spectrum of T*. 

A linear transformation T on is said to be self-adjoint if T* = T. When 
\^ is a real vector space, one might call T symmetric in this event. Sums of 
self-adjoint linear transformations are again self-adjoint, as are products of 
self-adjoint linear transformations by real numbers. The eigenvalues of a self- 
adjoint linear transformation on a complex inner product space are always 
real numbers. This can be derived from the observations in the preceding 
paragraph, and one can also compute as follows. If f is a nonzero vector in 

V which is an eigenvector of T with eigenvalue A, and if T is self-adjoint, 
then 

(6.31) X{v,v) = {\v,v) = {T{v),v) = {v,T*{vy 

= {v,T{vy = {v,Xv) = X{v,v), 

and hence A = A. 

If T is a self-adjoint linear transformation on V and vi, V2 are vectors in 

V which are eigenvectors of T with eigenvalues Ai, A2, respectively, and if 
Ai 7^ A2, then vi and V2 are orthogonal to each other. This is because 

(6.32) Xi{vi,V2) = {XiVi,V2) = {T{vi),V2) 

= {Vi,T{v2y = {Vi,X2V2) = X2 {Vl,V2), 



so that {vi,V2) is necessarily 0. 
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An important feature of self-adjoint linear transformations is that they 
can be diagonalized in an orthonormal basis. Let us recall how this can be 
proved. A first step is to show that there is at least one nonzero eigenvector. 
On a complex vector space, this is true for any linear transformation, as in 



Theorem |6.2| in Section |6.1| . For a self-adjoint operator T one can also obtain 
this by maximizing or minimizing {T{v),v) over the unit sphere (||f|| = 1), 
and this works in both the real and the complex cases. More precisely, one can 
check that the eigenvectors of T with norm 1 are exactly the critical points 
of the real- valued function (T(f ), v) on the unit sphere. (Alternatively, one 
might prefer to phrase this in terms of Lagrange multipliers.) The existence 
of points in the unit sphere at which the maximum or minimum of {T{v), v) 
is attained follows from general results about continuity and compactness, 
and these points are automatically critical points. 

Next, suppose that T is a self-adjoint operator and that v is an eigenvector 
of T with eigenvalue A. The orthogonal complement of f in is the subspace 
of vectors w & V such that (f , w) = 0, as in Section |47| . The self-adjointness 
of T implies that T maps the orthogonal complement of v to itself, since 

(6.33) {v,T{w)) = {T{v),w) = {Xv,w) = 

when (f , w) = 0. 

One can restrict T to the orthogonal complement of v and repeat the 
process, to get a new eigenvector which is orthogonal to the previous one, 
etc. In this way one can show that there is an orthonormal basis for the 
whole space which consists of eigenvectors of T, as desired. 

Now let us assume for a moment that we are working with a complex 
inner product space. If T is a linear operator on the vector space, then 
T can be diagonalized in an orthogonal basis if and only if T is normal, 
which means that T and T* commute (TT* = T*T). Indeed, if T can be 
so diagonalized, then T* is diagonalized by the same basis, and T and T* 
commute. Conversely, any linear operator T can be written a.s A + i B, where 
A and B are self-adjoint, by taking A = {T + T*)/2 and B = {T - T*)/{2i). 
Thus A and B can be diagonalized by orthonormal bases, but in general the 
bases are not the same. However, if T is normal, then A and B commute, 
and one can check that the eigenspaces of A and B are invariant under each 
other. This permits one to choose an orthogonal basis in which both are 
diagonalized, and hence in which T is diagonalized. (See also Section |^ 
concerning this last point.) 
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As a consequence of the diagonalization, if T is a self-adjoint or normal 
linear transformation on a complex inner product space, then the spectral 
radius of T is equal to the norm of T. (See also Section |B75[ ) An analogous 
statement holds for self-adjoint linear operators on real inner product spaces 
(for which we have not officially defined the spectral radius). 

On either a real or complex inner product space, suppose that T is a 
linear operator such that 

(6.34) {T{v),T{w)) = {v,w) 

for all V, w in the vector space. This is equivalent to asking that 

(6.35) ||T(t;)|| = 



for all V, because of polarization, as in Remark |4.41| in Section |4.6| . This 
condition is also equivalent to 



(6.36) T* = T-\ 

The linear operators which satisfy this property are called orthogonal trans- 
formations in the case of real inner product spaces, and unitary transforma- 
tions in the case of complex inner product spaces. A unitary transformation 
T on a complex inner product space is normal in the sense defined above, 
since T automatically commutes with T~^, and hence T can be diagonalized 
in an orthonormal basis. 



6.5 The C*-identity 

Let (V, (■,■)) be a finite-dimensional inner product space, real or complex, 
and let T be a linear operator on V. The C* -identity states that 

(6.37) \\T*T\lp=\\T\\%, 

where || ■ \\op denotes the operator norm of a linear operator on V, relative to 
the norm || ■ || on \^ associated to the inner product. To see this, notice first 
that 

(6.38) \\T*T\\op< \\T*\\op\\T\\op=\\T\\%, 
using (|6.25|) in the second step. On the other hand, 

(6.39) ||T*T||,p> ||T||^ 
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can be derived from ( |6.24| ) applied to T*T. Thus we get ( |6.37| ). Similarly, 
or by applying replacing T with T* in ( |6.371 ), one can obtain that 

(6.40) \\TT*\\op = \\T\\lp. 
As a special case, if T is self-adjoint, then 

(6.41) \\TX,= \\T\\l^. 
For any operator T we have the inequality 

(6.42) < \\T\\%, 

but the reverse inequality is not true in general. For instance, could be 
the zero operator while T is not. 

If T is self-adjoint, then so is T^ for any positive integer j, since [T^)* = 
^rp*y _ rpj ^ Thus we can apply ( |6.41| ) to T^ to obtain 



(6.43) \\T^^\\op=\\T^\ 



op 



It follows that 

(6.44) \\T'\\op=\\Tt^ 

when / is a power of 2, and from this one can verify equality holds for all 
positive integers /. (Exercise, using the fact that the norm of a product is 
always less than or equal to the product of the norms.) 

From here it follows in turn that if is a complex vector space, then the 
spectral radius of T is equal to ||T||op, by Proposition |6.14| in Section [U7^. This 
and the preceding identities are also simple consequences of a diagonalization 
of T in an orthonormal basis, as in the previous section. 

Now suppose that T is normal, i.e., T*T = TT*. For any positive integer 
/ we have that 

(6.45) ||T'||^^=||(TtT'||,, 
by the C*-identity. Because T and T* commute, 

(6.46) ^jnly rjnl ^ ^rj.* rjny ^ 

and hence 



(6.47) \\{T'yT'\\op=UT*Ty\\op=\\T*TC=\\T 



21 

op 1 1 1 1 op ■ 
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The second step uses the fact that T* T is self-adjoint, so that the earher 
formulae are applicable. Therefore, 



(6.48) ||T^||„p= ||r| 



op 



for all positive integers /. 

As before, when V is a. complex vector space, this implies that the spectral 
radius of T is equal to the norm of T. In this case, this statement and (|6.48|) 
also follow from a diagonalization of T in an orthonormal basis. 



6.6 Projections 

In this section vector spaces are again permitted to be real or complex. 

Definition 6.49 Let V be a vector space. A linear operator P : V V is 
said to be a projection if = P. Two subspaces Vi and V2 of V are said 
to be complementary or complements of each other if Vi (1 V2 = {0} and 
span(Vi,V2) = V. 

Observe that / — P is automatically a projection when P is, since (J — 
P)^ = 1 — P — P + P^ = I — P. The kernel of / — P is equal to the image 
of P, and the image of / — P is equal to the kernel of P. 

Lemma 6.50 Two subspaces Vi, V2 of a vector space V are complementary 
if and only if for each v & V there are unique vectors Vi G Vi and f 2 G V2 
such that f = f 1 + f 2 . In this event the mappings v ^ Vi and v ^ V2 are 
linear, and they define projections from V onto Vi along V2 and from V onto 
V2 along V\, respectively. 

If P : V ^ V is a linear mapping which is a projection, then the kernel 
and image of P are complementary subspaces of V , and P is determined 
uniquely by its kernel and image. 

(Exercise.) 

Let us say that P is the projection of V onto V2 along V\ when P is a 
projection on V with kernel Vi and image V2. Of course I — P is then the 
projection of V onto Vi along V2. 

If P is a projection, then the nonzero vectors in the kernel of P are 
eigenvectors of P with eigenvalue 0, and the nonzero vectors in the image 
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of P are eigenvectors of P with eigenvalue 1. The spectrum of P consists 
exactly of and 1, except in the degenerate cases where P = or P = I, for 
which the spectrum consists of only or only 1, respectively (unless V has 
dimension 0). 

If (V, (■, ■)) is an inner product space, then for each subspace of there 
is a special projection of V onto W, namely, the orthogonal projection, as 



in Section |4.7| . In the terminology above, this is the projection of V onto W 
along the orthogonal complement W-^ of W. One can rephrase this by saying 
that orthogonal projections are exactly the same as projections in which the 
kernel and image of the operator are orthogonal complements of each other. 

Lemma 6.51 A projection P on an inner product space (V, (■, ■)) is an or- 
thogonal projection if and only if it is self-adjoint. 

Suppose first that P is an orthogonal projection. Let u and v be arbitrary 
elements of V. Then (/ — P){v) is orthogonal to P{u), i.e., 

(6.52) {P{u),{I-P){v)) = 0. 
This is the same as saying that 

(6.53) {P{u),v) = {Piu),Piv)). 
Similarly, (J — P){u) is orthogonal to P{v), so that 

(6.54) {P{u),Piv)) = {u,P{v)). 

Hence {P{u),v) = {u,P{v)), as desired. 

Conversely, if P is a self-adjoint projection on V, then 

(6.55) {P{u), (/ - P)iv)) = {u, PHI - P){v))) = 

for all u,v & V. This shows that the image of P is orthogonal to the kernel 
of P, so that P is an orthogonal projection. 

Lemma 6.56 A nonzero projection P on an inner product space (V, (■, ■)) is 
an orthogonal projection if and only if \\P\\op = 1- 
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The "only if" part of this lemma was mentioned at the end of Section ^7l\ . 
For the "if" part, suppose that P is a projection on V with operator norm 
1. The operator norm of any nonzero projection is greater than or equal to 
1, since there are nonzero vectors which are mapped to themselves by the 
projection. Thus the real content of the hypothesis that ||-P||op = 1 is that 

(6.57) 11^(^^)11 < ll^^ll for all v e V. 

Let u, w be elements of V such that u is in the kernel of P and w is in 
the range of P. We would like to show that u and w are orthogonal to each 
other. Assume for the sake of a contradiction that this is not the case, i.e., 

(6.58) {u,w)^0. 

We may as well assume that {u, w) is a positive real number, since this can 
be arranged by multiplying m or w by a scalar. 

Set V = w — tu, where t is a positive real number, to be specified in a 
moment. Note that P{v) = w, independently of t. On the other hand, 

(6.59) \\vf = (v^v) = {w,w) -2t{u,w) + f {u,u) 

= WwW^ -2t{u,w) +f\\uf. 

Because {u, w) > 0, it follows that 

(6.60) ||t;f < ||y;||2 = ||p(^)||2 



when t > is sufficiently small. This contradicts ( |6.57|) , and the lemma 
follows. 

Now suppose that is a vector space equipped with a norm || ■ ||, which 
is not required to come from an inner product. A nonzero projection on V 
has operator norm greater than or equal to 1, for the same reason as before, 
but in general it is not as easy to obtain projections with norm 1, or even 
with a good bound on the norm, as in the case of inner product spaces. If 



we take V to be R" or C", with norm || ■ ||p as in Section [4.1| , then the usual 
"coordinate projections" have norm 1. These are the projections defined by 
replacing certain coordinates of a vector v = {vi,V2, ■ ■ ■ ,Vn) with while 
leaving the other coordinates unchanged. 

As another special case, let (V^, || ■ ||) be any normed vector space, and 



let z be any vector in V with = 1. As in Section [4.3|, there is a linear 



functional A on such that ||A||* = 1 and X{z) = 1. Define P : V ^ V by 
(6.61) P{v) = \{v)z. 
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It is easy to verify that P is a projection of V onto the 1- dimensional subspace 
generated by z with operator norm 1. 



6.7 Remarks about diagonalizable operators 

Let y be a real or complex vector space, and let T be a linear operator on V. 
For each eigenvalue A of T, let E{T, A) denote the corresponding eigenspace, 
so that 

(6.62) E{T, X) = {veV : T{v) = A v}. 

Suppose that \i, . . . , \i are distinct eigenvalues of T. If Vi,...,vi are 
vectors in V such that vj G E{T, Xj) for each j and 

(6.63) j2vj = 0, 

i=i 

then Vj = for each j. This is not hard to prove using induction on /, for 
instance. 

If V is equipped with an inner product {u, w) and T is self-adjoint with 
respect to this inner product, then eigenvectors of T associated to distinct 
eigenvalues are orthogonal. This was mentioned in Section |6.4|, in the para- 



graph containing ( |6.32|) . The linear independence property described in the 
preceding paragraph provides a version of this for linear operators in general, 
without inner products or self-adjointness. 

Let us say that T is diagonalizable if there is a basis of V consisting of 
eigenvectors of T. Thus, if one represents T by a matrix using this basis, 
then the matrix would be a diagonal matrix (with nonzero entries only along 
the diagonal). 

If T is diagonalizable, then clearly V is spanned by the set of eigenvectors 
of T. The converse is also true, but requires a bit more care. Specifically, 
one can get a basis of V consisting of eigenvectors of T by choosing arbitrary 
bases for the eigenspaces of T, and then combining them to get a basis 
for all of V. That the union of the bases of the eigenspaces is a linearly 
independent set of vectors in V can be derived from the observation above, 
in the paragraph containing ( |6.63| ), and from the linear independence of each 
of the bases of the eigenspaces separately. The fact that the union of the 
bases of the eigenspaces of T spans V exactly follows from the assumption 
that V is spanned by the eigenvectors of T. 



6.8. COMMUTING FAMILIES OF OPERATORS 
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6.8 Commuting families of operators 

Let V he a real or complex vector space, and let JF be a family of linear 
operators on V. We assume that every operator T in is diagonalizable, 
and that is a commuting family of operators, which means that 

(6.64) SoT = ToS 

for all S, T in JF. Under these conditions, the operators in JF can be simul- 
taneously diagonalized. In other words, there is a basis vi, . . . ,Vn ioi V such 
that each Vi is an eigenvector for every operator in JF. 

To see this, let us begin with the following observation. Suppose that T 
is an operator in JF, and that A is an eigenvalue of T. Let E{T, A) denote the 
corresponding eigenspace of T, as in Section |6l7| . If S is any other operator 
in JF, then 

(6.65) S{E{T,X)) C E{T,X). 
Indeed, if f G E(T, A), then 

(6.66) T{S{v)) = S{T{v)) = S{X v) = A S{v), 

and hence S{v) G E(T, A). Of course we used the information that S and T 
commute in the first step in this computation. 

Now let Ai, . . . , Afc be a list of all of the eigenvalues of T, without repe- 
titions. The corresponding eigenspaces E{T, Xj) may have dimension larger 
than 1, because of multiplicities. For each vector v in V, there exist unique 
vectors Vj in E{T, Xj), j = 1, . . . , k, such that 



(6.67) v = Y. 



r 



The existence of the w/s comes from the fact that V is spanned by the 
eigenvectors of T, since T is diagonalizable, and the uniqueness follows from 



the observation in the paragraph containing (|6.63| ). 

For each j = l,...,k, the correspondence v ^— vj determines a well- 
defined mapping on V, which is in fact linear, and a projection from V onto 
E{T,Xj). This uses the existence and the uniqueness of the decomposition 
in (|6.67|) . Let us denote this projection by Pj. 



If S is in JF, then 
(6.68) SoPj = Pj o S 
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for each j. To see this, we can compute as follows. Given v in V, decompose 
V into a sum of Vj G E{T, Xj) as above. Thus 

(6.69) S{v) = j:S{v,), 



and each S{vj) lies in E(T,Xj), by (|6.65| ). This implies (|6.68| ), since the 
decompositions are unique. 

Let a be any eigenvalue of S E J-', and E{S,a) the corresponding eigenspace. 
Notice that 

(6.70) Pj{E{S,a)) C E{S,a). 

This is the analogue of (|6.65| ), with T replaced by S, X by a, and S by Pj. 
The main point is that 5* and Pj commute, as in (|6.68|) , which is all that we 
needed before. 

In other words, (|6.70|) says that if v is an eigenvector of S with eigenvalue 
a, and if we decompose v into eigenvectors of T, as in ( |6.67| ), then the 
components of this decomposition are eigenvectors of S with eigenvalue a. 

We now leave the rest of the argument as an exercise. 

Let us consider a variant of this, where V is an inner product space, 
and consists of self-adjoint operators (which are then automatically diag- 
onalizable). In this case one can get a simultaneous diagonalization by an 
orthogonal basis. This can be obtained using the orthogonality of the indi- 
vidual eigenspaces involved. There are also slightly different arguments that 
one can consider, where one goes from ( |6.65| ) to saying that the restriction 
of an S* G to an eigenspace of T is self-adjoint as an operator on that 
subspace. 

In the context of complex inner product spaces, one can look at commut- 
ing families of operators JF such that T* lies in JF whenever T lies in JF, and 
again get a simultaneous diagonalization by an orthogonal basis. This situa- 
tion can be reduced to the previous one, by replacing with the collection 
of self-adjoint operators which arise as T -|- T* or i (T — T*) for T in J-'. 



Chapter 7 

Linear operators between inner 
product spaces 



7.1 Preliminary remarks 

Suppose that Vi, V2 are finite-dimensional vector spaces, both real or both 
complex, and that (■, ■)! and (■, ■)2 are inner products on Vi and V2, respec- 
tively. If T is a linear transformation from Vi to V2, then there is a unique 
linear transformation T* : V2 —>■ V\ such that 

(7.1) (T(t;),w;)2=(t;,T*H)i 

for all V and if G V2, for essentially the same reasons as before. This is 
again called the adjoint of T. 

Let us make the standing assumptions in this and the next two sections 
that (Vi, (■, ■)!) and (V2, (■, ■)2) are as above, and that || ■ ||i and || ■ II2 denote 
the norms on Vi and V2 associated to these inner products. For a,b = 1,2, 
if S" : — s> Vfe is a linear mapping, then ||S'||op,a6 denotes the operator norm 
of S defined using || ■ ||a on the domain and \\ ■ \\b on the range. This can be 
characterized in terms of inner products by 

(7.2) ll^llop.ab = sup{|(5'(i;), : v,w eV, \\v\\a = \\w\\b = 1}, 



as in ( p.24| ) in Section 



This characterization of the operator norm implies that 

(7.3) ||7'||op,12 = ||7'*||op,21- 
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Also, (T*y = T for all T : Vi ^ V2, and sums, scalar multiples, and com- 
positions of adjoints behave in the same manner as before, with the obvious 
allowances for the changes in the domains and ranges. The C*-identities 



|2 



('''•4) ||7'*T||op^ii — ||Tr*||op^22 — ||r||op,12 

can be verified as well. 



7.2 Schmidt decompositions 

If (Vi, (■, ■)!) is different from (V2, (■, ■)2), then it does not make sense to talk 
about an operator T : Vi ^ V2 being self-adjoint, or normal, or diagonalized 
in an orthonormal basis. However, we do get the linear mappings 

(7.5) T*T:Vi-^ Vi, TT* : ^ ^ V^2 

(with the ranges the same as the domains), and T* T is self-adjoint as a 
linear mapping on Vi equipped with while TT* is self-adjoint as a 

linear mapping on V2 equipped with (-, ■)2. We can use this to derive the 
following Schmidt decomposition for arbitrary linear mappings from Vi to 

Proposition 7.6 Let T be a linear operator from V\ to V2. There exists an 
orthonormal basis {uj}'jLi ofVi and an orthogonal set of vectors {wj}Y=i 



m 



V2 such that 

m 

(7.7) T{v) = Y,{v,Uj)iWj 

for all V E Vi. 

Here the wj^s can be equal to 0, as when the kernel of T has positive 
dimension. In particular, this necessarily happens when the dimension of V2 
is less than the dimension of Vi. 

To prove the proposition, let be an orthonormal basis of Vi con- 

sisting of eigenvectors of T* T. Such a basis exists, since T* T : Vi — >• Vi is 



self-adjoint. Set wj = T{uj) for each j, so that (fTTl ) holds automatically. 
It is not hard to verify that the Wj's are orthogonal to each other in this 
situation. 

Note that T{uj) is an eigenvector of TT* for each j, with the same 
eigenvalue as Uj has for T* T. 



7.3. THE HILBERT-SCHMIDT NORM 



63 



7.3 The Hilbert-Schmidt norm 

Let T be a linear mapping from Vi to V2. Suppose that and {bk}^^i 

are orthonormal bases for Vi and V2, respectively. Thus 



m n 



(7.8) V = '^{v,aj)iaj, w = ^{w,hk)2hk 

i=i k=i 

for all f G Vi and w G V2, and we can express T as 

m n 

(7.9) T(t;) = ^^(t;,a,)i(r(a,),6,)2 6.. 

i=i fc=i 

Consider the sum 

m n 

j=i fc=i 

By definition of the adjoint T*, this is the same as 

m n 

(7.11) EEK«.'^*(&^))il'- 

j=i k=i 

Because {aj}"^^^ and {6^}^^^ are orthonormal bases for V , these sums are 
also equal to 

m 

(7.12) T^wn^Mi 

and to 

n 

(7.13) EI|2^*(MII?. 

fe=i 

The Hilbert-Schmidt norm of T, denoted ||T||j|/5, is defined to be the 
square root of (|7.10|) . This can also be described as the || ■ II2 norm in Section 
^A\ , applied to the matrix entries {T{aj),bk)2 of T with respect to these 
choices of bases. The Hilbert-Schmidt norm does not depend on the partic- 
ular choices of the bases, however; namely, the equality with ( |7.12| ) shows 



that the Hilbert-Schmidt norm does not depend on the choice of {bk}k=i, 
and similarly the equality with ( [7.13| ) shows that the Hilbert-Schmidt norm 
does not depend on the choice of {aj}^^. 
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The equality between ( |7.1CI| ) and implies that IIT*!!//^ = ||T||//s'. 



More precisely, one can rewrite (|7.11|) as 

m n 

(7.14) EEl(^*(M,a.)iP, 

j=i k=i 

and this is the same as ( |7.10| ) with T replaced by T*, and with the roles of 
Vi and V2 exchanged (and exchanging the roles of {a^}^]^ and {&fc}fc=i with 
them) . 

Now suppose that we have two other inner product spaces (Vq, (■, ■)o) and 
(V3, (■, ■)3), which are real if Vi, V2 are real, and complex if Vi, V2 are complex. 
If A : Vo — Vi and C : V2 — V3 are arbitrary linear mappings, then we have 
that 

(7.15) ||CiT||h5< IICill op,01 ll^ll/fS') ||^C*2||//5< ||C2||op,23 ||^||//S' 

(where we extend our earlier notation to mappings from Vp to Vq for p,q = 
0, 1, 2, 3). This follows from the formulas ( [7.12 ) and ( 7.13| ) for the square of 



the Hilbert-Schmidt norm (allowing also for linear mappings from Vq to V2 
and from Vi to V3). 

For any linear mapping T : Vi ^ V2 we have that 



(7.16) ||r||,p,i2 < \\T\\hs < min(m, n)'/^ \\T\ 



op, 12- 



(Note that m and n are the dimensions of Vi and V2, respectively.) This 
follows easily from the equahty of ||T||/f5 with (|7.12|) and ( |7.13| ), and the fact 
that 1 1 T* 1 1 op 21 = II ^11 op, 12- The first inequality in ( [7.16| ) becomes an equality 
when T has rank 1, and equality holds in the second inequality for some 
(nonzero) operators as well, e.g., when Vi and V2 are the same and T is a 
multiple of the identity. 



7.4 A numerical feature 

Lemma 7.17 Suppose that V is a complex vector space with inner product 
(■, ■), and that T is a linear operator on V. Then T is self-adjoint if and only 
if the numbers 

(7.18) {Tiv),v) 
are real for all v & V . 



7.5. NUMERICAL RANGE 
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If T is self-adjoint, then we have that 



(7.19) {T{v),v) = {v,T{v)) = {v,T*{v)) = {T{v),v), 

by the basic properties of the inner product and the adjoint of a hnear trans- 
formation. Thus the numbers ( [7.18| ) are real in this case. 



Conversely, suppose that the numbers ( [7.18|) are all real. For any linear 
operator T on V, we can write 

(7.20) T = A + iB 

where A and B are self-adjoint, by taking A = {T + T*)/2 and B = {T — 
T*)/{2i). If the numbers (|7.18|) are all real, then the same is true of the 
numbers 

(7.21) ^{Biv),v), 

since {A{v),v) G R for all v & V by the first part of the proof. On the other 
hand, {B{v),v) G R for all f G as well, since B is self-adjoint, and hence 

(7.22) {B{v),v) = 0. 

This last implies that B = 0, using the diagonalizability of B. Thus T = A, 
so that T is self-adjoint. This completes the proof of the lemma. 



7.5 Numerical range 

Let (V, (■,■)) be a (finite-dimensional) inner product space, and let T be a 
linear operator on V. Consider the set 

(7.23) W{T) = {{T{v),v) -.veV, \\v\\ = 1}. 

This set is compact, for standard reasons of continuity and the compactness 
of the unit sphere in V. 

Suppose first that is a real vector space, so that W{T) is a subset of 
the real line. If we set 

rri I ryn* rri rri^ 

(7.24) Ai = -— and A2 = 



2 ' 

then T = Ai + A2, Al = Ai, and A2 = —A2, i.e., Ai is symmetric and A2 is 
antisymmetric. For A2 we have that 

(7.25) {A2{v),v) = -{v,A2{v)) = -{A2{v),v), 
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so that {A2{v),v) = for all v eV. Thus 



(7.26) 



W{T) = W{Ai). 



Because Ai is symmetric, it can be diagonalized in an orthonormal basis. 
Using this one can check that 



and in fact that W^(Ai) is equal to the convex hull of the eigenvalues of Ai. 

Alternatively, one might observe that W{T) is connected, because the 
image of a connected set under a continuous mapping is connected. The unit 
sphere in V is connected as long as the dimension of V is at least 2, and 
when the dimension of is 1, W{T) consists of a single point anyway. It is 
also nice to observe that 



so that V and —v contribute the same value to the numerical range. Of course 
connected subsets of R are automatically convex. 

Now let us turn to the case where V is a complex vector space, and W(T) 
is a subset of C. It is no longer true that the numerical range of T is equal 
to the numerical range of the self-adjoint part of T, which is to say that the 
anti-self-adjoint part of T can contribute to W(T). 

If T is normal, so that the self-adjoint and anti- self-adjoint parts of T 
commute, then T can be diagonalized in an orthonormal basis of V, as men- 
tioned in Section |6.4| . As in the real case, it is not hard to use this to verify 
that W(T) is equal to the convex hull of the spectrum of T. 

In fact, W{T) is convex for any linear operator T on V. See Chapter 17 



(7.27) 



Vr(Ai) is convex. 



(7.28) 



{T{v),v) = {T{-v),-v) 



of |Ha§. 



Chapter 8 

Subspaces and quotient spaces 



As before, it will be convenient to make the standing assumption in this 
chapter that all vector spaces are finite-dimensional. 

8.1 Linear algebra 

Let V he a. vector space (real or complex), and let W he a subspace of 
V. There is a standard notion of the quotient V/W of V by W, which 
is obtained by identifying pairs of points in V whose difference lies in W. 
These identifications are compatible with addition and scalar multiplication 
of vectors, so that the quotient V/W is a vector space in a natural way. 

One also gets the quotient mapping q : V V/W, which refiects the 
identifications. This is a linear mapping from V onto V/W whose kernel is 
equal to W. 

Since V/W is a vector space, it has a dual space (V/W)*, and the dual 
space can be identified with a subspace of the dual V* of in a natural way. 
Specifically, every linear functional on V/W determines a linear functional 
on V, by composition with the quotient mapping q. The linear functionals 
on V which arise in this manner are exactly the ones that vanish on W. 

In general, if X is a subspace of V, one sometimes writes X"*" for the 
subspace of V* consisting of linear functionals A on which are equal to 
on X. Since V* is a vector space in its own right, one can consider the 
subspace (X-*-)-*- of the second dual V**. The second dual V** is isomorphic 
to in a natural way, as in Section |4.3| , and it is easy to check that (X-*-)-*- 
corresponds exactly to X under this isomorphism. The previous statement 
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about (y/W)* can be rephrased as saying that (V/W)* is isomorphic in a 
natural way to W-^ C V*. 

Similarly, if X is any vector subspace of V, then every linear functional 
on V restricts to a linear functional on X. Every linear functional on X 
arises in this manner, and two linear functionals Ai, A2 on V induce the same 
linear functional on X when Ai — A2 G X"*-. In this way, X* is isomorphic in 
a natural way to V*/X^. 

This second scenario is essentially equivalent to the first one, with X ^ V 
corresponding to W-^ C V*, and using the isomorphism between a vector 
space and its second dual and between X and (X-*-)-*-. 

Remark 8.1 A vector subspace ?7 of is complementary to W (Definition 



5.49| in Section |6.6| ) if and only if the restriction of the quotient mapping 



q : V V/W to U is one-to-one and maps U onto V/W . 



8.2 Quotient spaces and norms 

Now suppose that V is & vector space equipped with a norm || ■ ||. Let W be 
a vector subspace of V , and consider the quotient V/W and the associated 
quotient mapping q : V ^ V/W . The quotient norm \\ ■ \\q on V/W can be 
defined as 

(8.2) PIIq = inf{||t;|| -.veV, q{v) = z}. 
This is the same as 

(8.3) \\q{y)\\Q = ini{\\y + w\\:weW}. 

It is not hard to check that || • \\q does indeed define a norm on V/W . 

Using the norm || ■ || on ^ and the norm || ■ ||q on V/W one can define the 
operator norm for any linear transformation from V to V/W . The operator 
norm of the quotient mapping q : V V/W is equal to 1 (except in the 
degenerate case where W = V). 

Because of our standing assumption that V be finite-dimensional, a min- 
imizing V & V for the infimum in (p.2|) exists. It is not unique, in general. 



but this is the case under assumptions of strict convexity. See Section 



Let [/ be a subspace of V which is complementary to W , and let P be 



the projection of V onto U with kernel W , as in Section |6.6| . Let us verify 
that 

(8.4) \\P\\-^ \\u\\ < \\qiu)\\Q < \\u\\ for all u e U, 
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where ||-P||op denotes the operator norm of P as a hnear operator on V 
(with respect to the norm || ■ || on V^). The second inequahty in (|8.4|) holds 
automatically from the definition of the quotient norm. As for the first 
inequality, if w is any element of W, then P{u + w) = u, and hence 

(8.5) 11^11 < ||P||op IIm + 

One can take the infimum of the right side over w & W to get < 

II 1 1 op 

||g(M)||Q, as desired. 

In particular, if ||P|| op happens to be equal to 1, then the restriction of 
the quotient mapping q to U defines an isometry from U equipped with the 
norm || ■ || to P^/iy equipped with the quotient norm || ■ This includes the 
case where the norm || ■ || on V comes from an inner product and U is the 
orthogonal complement of W. 

Let us consider the quotient norm || ■ ||q in terms of duality. As in Section 



BT1| , the dual of V/W as a vector space can be identified with W C V*. Let 
II ■ II* be the dual norm of || ■ || on l^*, which can be restricted to W-^. Under 
the isomorphism between {V/W)* and W-^, the dual of the quotient norm 
II ■ IIq corresponds exactly to || ■ ||* on W-^. Indeed, if A G W-^, then X{w) = 
when w E W, and 

(8.6) |A(w)| < ||A||*||i;|| ioiallveV, 

by the definition of the dual norm. We can combine these two pieces of 
information to obtain 

(8.7) \X{v)\<\\X\\* mf{\\v + w\\:weW} for all G V. 

This is the same as saying that the linear functional on V/W induced by A 
has norm less than or equal to ||A||* with respect to the norm || ■ ||q on V/W. 
In the other direction, if the linear functional on V/W induced by A G W-^ 
has norm less than or equal to k with respect to || ■ ||q, then 

(8.8) \X{v)\ < k mf{\\v + w\\ : w e W} for all G V. 
This implies trivially that 

(8.9) \X{v)\<k\\v\\ forallt;GV, 

i.e., ||A||* < k. Hence ||A||* is less than or equal to the norm of the linear 
functional on V/W induced by A with respect to the norm || ■ ||q on V/W. 
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This shows that the dual of the quotient norm || ■ \\q corresponds exactly to 
the dual norm || ■ ||* restricted to W-^ C V*. 

It is instructive to look directly at the case of subspaces as well, even 
though this could be derived from the preceding discussion using previous 
results about second duals. Specifically, suppose that X is a vector subspace 
of V, to which we can restrict the norm || ■ ||. This leads to a norm on the 
dual space of X. As a vector space, the dual of X is isomorphic in a natural 
way to V*/X-^, as discussed in Section Under this isomorphism, the 
dual norm of X corresponds exactly to the quotient norm on V*/X-^ defined 
using the dual norm || • ||* on V*. To see this, it is not hard to check that 
the dual norm of X is less than or equal to the quotient norm on V*/X-^ 
defined using || ■ ||*, just by the various definitions. The other inequality is 
less obvious, and it amounts to the statement that a linear functional on X 
can be extended to a linear functional on all of V with the same norm. This 



holds by Theorem 4.17. 



8.3 Mappings between vector spaces 

Suppose that Vi and V2 are vector spaces, both real or both complex, and 
that II • 111 and || • II2 are norms on Vi and V2, respectively. Let F be a vector 
subspace of Vi, and assume that we have a linear mapping T : Y ^ V2. 
Under what conditions can we find an extension of T to a linear mapping 
from Vi to V2 which has the same operator norm as the original mapping, or 
perhaps an operator norm which is not too much larger? Here the operator 
norms use || • ||i on the domain and || ■ II2 on the range. 

In general an extension with the same norm does not exist, and it is not 
so easy to find an extension whose norm is not too much larger than the 
original mapping. Let us mention a few basic points related to this, however. 

Suppose that P is a projection from Vi onto Y, and let k be the operator 
norm of P (relative to || ■ ||i in both the domain and range). One way to 
get an extension of T is to simply take the composition T o P. The operator 
norm ofToP (using || ■ ||i on the domain and || ■ II2 on the range) is bounded 
by k times the operator norm of T, as in (^M) in Section gj. 

This works quite well when the norm || ■ ||i on Vi comes from an inner 
product, so that there is a projection of norm 1 onto any subspace of Vi, 
namely, the orthogonal projection. 

Note that one might take V2 = Y, || ■ II2 = || ■ ||i on Y, and T to be the 
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identity on Y. Extensions of T to mappings from Vi to V2 = 1^ are then 
exactly projections of Vi onto Y. Thus the original question becomes one 
about norms of projections onto Y. 

In another vein, if V2 is 1-dimensional, then the question reduces to that 
of extending linear functionals, and this is possible while keeping the same 
norm because of Theorem [4.17 . 

This can be carried over to the case where V2 is R" or C" for some n 



and one employs the norm || ■ ||oo from Section W^. A linear mapping from a 



vector space Vi into R*^ or C" (according to whether Vi is real or complex) 
can be described in terms of n linear functionals on Vi, corresponding to 
the n coordinates for R" or C". When we employ the norm || ■ ||oo on R" 
or C", then the norm of a mapping from Vi to R" or C" is the same as 
the maximum of the norms of these n linear functionals on Vi (using the 
norm that we have on Vi). In this way the extension problem reduces to its 
counterpart for linear functionals. 

Here is a "dual" version of the question at the beginning of the section. 
Suppose that Zi and Z2 are vector spaces, both real or both complex, and 
equipped with norms. Assume also that we are given a subspace Y of Z2, 
which we can use to define the quotient Z2/Y. Let q : Z2 Z2/Y he the 
corresponding quotient mapping. If A is a linear mapping from Zi to Z2/Y, 
under what conditions can we "lift" A to a linear mapping A : Zi ^ Z2 such 
that A = q o A and the norm of A is the same as the norm of A, or not too 
much larger? For the norm of A, we use the quotient norm on Z2/Y coming 
from the given norm on Z2. 

There are natural duals of the remarks above for the previous question. 
If t/ is a subspace of Z2 which is complementary to Y, then the restriction 
oi q to U is one-to-one and maps U onto Z2/Y. Thus we can obtain liftings 
A oi A which take values in A. There is then the question of the norm of the 
lifting, which can be analyzed in terms of projections, through the remarks 
in Section fj.2| . If the norm on Z2 comes from an inner product, then one can 
take U to be the orthogonal complement of Y, and the lifting to U has the 
same norm as the original mapping A. 

If Zi has dimension 1, then it is easy to do the lifting, while keeping the 
norm fixed. If instead Zi is R" or C", equipped with the norm || ■ ||i from 
Section ^71] (so that || ■ ||i is not now just a generic name for a norm, but 
rather a very specific one), then the argument for 1-dimensional domains 
can be carried over in a simple manner. More precisely, let ej, 1 < j < n, 
be the standard basis vectors for R" or C" (according to whether one is 
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working with real or complex vector spaces). The idea is to first choose 
A{ej) for each j so that q{A{ej)) = A{ej) and the norm of A{ej) is equal to 
the (quotient) norm of A{ej) for each j. Once this is done, A is determined 
on the whole domain by linearity. Because of the specific norm that we have 
on the domain, the norm of A as a linear mapping will be the same as the 
norm of A. 



Chapter 9 

Variation seminorms 



9.1 Basic definitions 

Let Z denote the set of integers, and let Z" denote the set of n-tuples of 
positive integers (for a given positive integer n). 

If X, y are elements of Z", then we shall say that x and y arc neighbors 
if there is an integer i, I < i < n, such that yi — Xi = ±1 and yj = xj for all 
J 7^ ^, 1 < J < where xi and yi denotes the components of x and y. This is 
clearly symmetric in x and y. We shall denote by N{x) the set of neighbors 
of X. Note that x is not considered to be a neighbor of itself. 

If t/ is a subset of Z"', then we define Int U, dU C t/ by 

(9.1) Int [/ = {x e [/ : N{x) C U} 
and 

(9.2) dU = C/\9C/. 

This notation is somewhat at odds with that in topology, but these definitions 
will play an analogous role. 

In this chapter we shall make the standing assumption that 

(9.3) [/ C Z" is finite and Int C/ ^ 0. 

Let / be a function on U . By "function" we mean one that is real- 
valued or complex- valued, so that we have the real vector space of real- valued 
functions on f/, or the complex vector space of complex-valued functions on 
U . Sometimes it will be helpful to specify one or the other, but often it will 
not matter. 
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For each real number p > 1, define the p-variation Vp{f) of / (on U) by 




This is a seminorm on the vector space of functions on U (real or complex), 
which means that it satisfies the same properties as a norm except that 
Vp{f) might be even if / is not the zero function. In particular, Vp{f) = 
whenever / is a constant function. The fact that Vp{-) satisfies the triangle 
inequality 

(9-5) Vpif + g)<Vp{f) + Vpig) 

for all functions / and (7 on f/ is a consequence of Minkowski's inequality for 
sums (using the same choice of p). 

As in the case of norms, we can apply the triangle inequality twice to 
obtain that 

(9.6) Vp{f)<Vp{g) + Vp{f-g) and Vpig) < Vp{f) + Vp{f ~ g) , 
and hence that 

(9.7) \Vpif)-Vpig)\<Vpif-g) 

for all functions / and g on U. One can use this to show that Vp{-) is 
continuous on the vector space of functions on U, because V^(-) is bounded 
by a constant multiple of a standard norm on functions. 

Observe that elements of U which are not neighbors of elements of Int U 
are not used in ( |9.4|) , so that the values of / at such points does not affect 
Vp{f). One might as well take out these points from U. 

It will sometimes be of interest to restrict our attention to functions / on 
U which are equal to on dU. In this case Vp{f) can be rewritten as 

(9.8) Vp{f) = ( E i/(^)-/(z/)r)'^ 

where 

(9.9) D(Intf/) = {[x^y] G Int f/ x Int f/ : x^y are neighbors}. 

On the vector space of functions on f/ that are on (9f/, lp(/) is a norm, 
i.e., lp(/) = implies that / is the zero function on V . (Exercise.) 
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9.2 The p = 2 and n = 1, p = 1 cases 

When p = 2, V2(/)^ is a quadratic form (Hermitian quadratic form in the 
complex case), and this leads to special features. Let {f,g) be the standard 
inner product for functions on U, defined by 

(9.10) {f,g)=J2f{x)g{x) 

xeu 

if we are working with real- valued functions, and by 
(9-11) (/,y)= 

x€U 

in the setting of complex-valued functions. In either situation, there is a 
unique hnear operator A on functions on U which is self-adjoint and satisfies 

(9.12) V,{f) ^ {A{f)J) 

for all /. This is not hard to see (and is an instance of a standard way of 

writing quadratic forms in general). 

It is also natural to consider the restriction to the subspace of functions 
which vanish on dU. There is again a linear operator Ao mapping this sub- 
space to itself which is self-adjoint (with respect to the restriction of the inner 
product to the subspace) and satisfies 

(9.13) V,{f) = {Ao{f)J) 

for all / in the subspace. This operator Aq is the compression of A to the 
subspace, which means that Aq is obtained by restricting A to the subspace, 
and then composing it with the orthogonal projection from the whole space 
to the subspace. 

We can describe in terms of a matrix as follows. For each x E U, let 
Cx denote the function on U which is 1 at x and at all other points. The 
functions e^;, a; G f/, define an orthonormal basis for the space of functions 
on f/, and the functions e^^, a; G Int f/, define an orthonormal basis for the 
subspace of functions which vanish on dU . 

For each x,y & Int U, x ^ y, we have that 

(9.14) {Ao{ex),ey) = —1 if y are neighbors 

= ii X, y are not neighbors. 
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and (Aoi^ex), Cx) is equal to the number of neighbors of x G Int U which also 
lie in Int U. (Exercise. Note that this is symmetric in x and y, as it should 
be, for self-adjointness.) The matrix for A can be determined analogously. 

Now suppose that n = 1 and p = 1. Let a, b be integers with a < b — 2, 
and let U denote the set of integers in the interval [a,b]. In the notation of 
Section pTll , Int U is the set of integers in (a, 6), and dU = {a, b}. 

If / is a function on U, then 

(9.15) Vr{f)>\fib)-f{a)\. 

Let us restrict our attention to functions which are real-valued. If / is mono- 
tone increasing on U, then 

(9.16) V,{f) = f{b)-f{a), 
and 

(9.17) V,if) = fia)-fib) 
when / is monotone decreasing on U. Conversely, 

(9.18) Vr{f) = \f{b)-f{a)\ 

implies that / is monotone (increasing or decreasing), as one can check. 



9.3 Minimization 



Suppose that [/ C Z" is as in Section |9.1j , and fix a real number p > I. Let 



6 be a function on dU. We shall be interested in functions f on U such that 
/ = 6 on dU and Vp{f) is as small as possible. 

For the usual reasons of continuity and compactness, minimizing functions 
/ exist. More precisely, although the set of all functions f on U which agree 
with b on dU is not bounded, one only needs to consider a bounded subset 
for the minimization. In other words, it is enough to look at such functions 
/ for which Vp{f) is bounded, since we are trying to minimize it. Because 
we are fixing the boundary values of the functions, bounding Vp{f) leads to 
a bounded set of functions. Otherwise, there is the problem that nonzero 
functions h can have Vp{h) = 0, such as nonzero constant functions. 

When p = 1, minimizers with prescribed boundary values like this do not 
have to be unique. Examples of this can be seen from the second part of 



Section 9.2 
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Let us look at the question of uniqueness in a somewhat general way. 
Assume that /i and /2 are two minimizers of V^(-) on U for the same set 
of boundary values b. In particular, V^(/i) = V^(/2)- For each t G [0,1], 
t /i + (1 — t) /2 is a function on U with boundary values b, and 

(9.19) Vp{t /i + (1 - t) h) < t Vpih) + (1 - t) Vp{f2). 

On the other hand, Vp{t /i + (1 — t) > = V^(/2), by the minimality 

of /i and /2, and hence 

(9.20) Vpit /i + (1 - t) h) = Vpih) = Vpif2), 

and t/i + (1 — t) /2- (This would work as well for minimizing any convex 
function.) 

We can rewrite (|9.20|) as 

(9.21) Y: E \tMx) + {l-t)Mx)-tMy)-{l-t)My)r 

x&nt U yeN{x) 

= E E i/i(^)-/i(z/)r= E E i/2(x)-/2(y)r 

x&nt U y£N{x) xGlnt U y£N{x) 

One the other hand, 

(9.22) \t Mx) + (1 - 1) Mx) - t My) - (1 - 1) My)f 

<{t\Mx)-h{y)\ + {l-t)\Ux)-Uy)\r 

< t i/i(x) - /i(y)r + (1 - 1) mx) - /2(t/)r, 

by the triangle inequality and the monotonicity and convexity of the function 
uP on [0, oo). Because of the equality 

(9.23) E E \tMx) + il-t)Mx)-tMy)-il-t)My)r 

x&nt U yeN{x) 

= E E {t\fiix)-My)r + il-t)\Mx)-MyW) 

xgint U yGN{x) 

(resulting from ( |9.21|) ), we obtain that 

(9.24) \t h{x) + (1 - t) /2(x) - t h{y) - (1 - t) /2(t/)r 

= {t\h{x)-h{y)\ + {^-t)\Ux)-h{y)\r 
= t\h{x)~h{y)? + {^-t)\f2{x)-h{y)\'' 
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for all X G Int U and y G N{x), i.e., we have equality in both inequalities in 

(EH). 

The first equality in ( |9.24| ) is equivalent to 

(9.25) \t h{x) + (1 - t) h{x) - t My) - (1 - t) f2{y)\ 

= t\f,{x)-h{y)\ + {l-t)\Mx)-My)l 

which is to say that we have equality for the triangle inequality. As a con- 
sequence, for each x G Int U and each y G N{x) there is a nonzero real 
or complex number w (depending on whether we are working with real or 
complex- valued functions) such that /i(x) — fi{y) and f2{x) — f2{y) are both 
multiples of w by nonnegative real numbers. 

Now suppose that p > 1. Now the second inequality in (|9.24| ) and the 
strict convexity of the function on [0, oo) imply that 

(9.26) |/,(a;)-/,(y)| = |/2(x)-/2(y)| 

for all X G Int U and all y G N{x). Therefore, 
(9-27) h{x) - My) = f2{x) - f2{y) 

for all X G Int f/ and y G N{x), because of the information derived in the 
previous paragraph. 

From this it follows that /i = /2 on all of U, since /i and /2 agree on dU 
by assumption. In short, we have uniqueness for minimizers when p > 1. 



9.4 Truncations 

Given real numbers c, d, consider the functions ti(s) and T2{s) on R given 
by 

(9.28) ri(s) = max(s, c) and T2{s) = m.m{s , d) . 
It is easy to check that 

(9.29) \ti{s) - Ti{t)\, \t2{s) - T2{t)\ < \s - t\ foralls,tGR 
and 

(9.30) \n{s) - n{t)\ < \s - t\, \t2{u) - T2{v)\ < \u - v\ 
when s or t G {—oo, c) and u or t> G (rf, oo). 
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Of course 

(9.31) 'Tii^) = s and T2{u) = u when s > c, u <d. 

Let U C Z" be as before, and let / be a real-valued function on U . For 
each p > 1, we have that 

(9.32) Vp{nof)^ Vp{r2of)<Vp{f) 
because of ( p.29| ). 

Fix a p > 1, and suppose that 6 is a real- valued function on dU, and that 
/ is a real-valued function on U which agrees with b on dU and for which 
Vp{f) is as small as possible. If b{x) > c for all x G dU, then f{x) > c for all 
X (z U . Indeed, if f{x) were strictly less than c for any x G Int f/, then we 
would have 

(9.33) V,{T,of)<V,{f), 

by ( |9.30| ). On the other hand, ri o / = n o 6 = 6 on dU , since h{x) > c for 
all X G dU. This shows that / would not minimize Vp among real-valued 
functions on U that agree with b on dU. Thus f{x) > c for all x & U. 

Similarly, if b{x) < d for all x G dU , then we may conclude that f{x) < d 
for all X E U. 

In the complex case, one can consider analogous "truncation" mappings 
as follows. Let if be a closed half-plane in C, and let L be the line which is 
the boundary of H. Define ^ : C — C by taking 6{z) = z when z E H, and 
6{z) to be the orthogonal projection of z into L when z G C\H. As before 
one has 

(9.34) \e{z) - e{w)\ < \z - w\ for all G C. 
Hence 

(9.35) Vpieof)<Vpif) 

for any p > 1 and any complex- valued function / on U. 

The strict inequality (|9.29|) does not work now in the same way as before, 
because \0{z) — 0{w)\ = \z — w\ when z,w E C\H lie in a line parallel to L. 
However, 

(9.36) \9{z) - 9{w)\ < \z - w\ 

does hold when one of z, w lies in H and the other lies in C\H. This is 
enough to show that if 6 is a complex-valued function on dU with values 
in H, and if / is a complex function on U which is equal to b on dU and 
for which Vp{f) is as small as possible (for some fixed p > I), then / also 
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takes values in H. That is, if / did not take values in H, then one could 
consider 6 o f, which would still agree with b on dU. In this case one can 
again get Vp{9 o /) < Vp{f), contradicting the minimality of Vp{f), but one 
has to be shghtly more careful than before. (The main point is that if there 
is an X e [/ such that f{x) e C\H in these circumstances, then there is in 
fact an a; G Int U such that f{x) G C\H and f{y) G H for some y G N{x). 
For this x and y one obtains 

(9.37) \e{f{x))-9{f{y))\<\f{x)-f{y)\ 

as above, and hence that Vp{9 o f) < Vp{f).) 

Since this works for arbitrary half-planes if in C, one can use this to 
show that the values of a minimizing function / lie in the convex hull of the 
boundary values. 

Here is a variant of this type of argument. Let r be a positive real number, 
and define cr : C ^ C by 

(9.38) a{z) — z when \z\ < r, a{z) — rz/\z\ when \z\ > r. 
One can again show that 

(9.39) \a{z) - a{w)\ < \z - w\ for all w G C, 

and that 

(9.40) Wi^) ~ o"(t«)| < 1-2 — I when \z\ or \w\ > r. 

(One may prefer to look at these inequalities in terms of differentials of a, 
rather than computing directly.) As before, Vp{a o f ) < Vp{f) for any p > 1 
and any complex- valued function / on [/, and if / minimizes Vp among 
functions with prescribed boundary values b on dU, and if \b{x)\ < r for all 
X G dU, then we obtain that \f{x)\ < r for all x & U. 



Chapter 10 
Groups 



10.1 General notions 

Let G be a finite group. Thus G is a finite set with a distinguished element 
e, called the identity element, and a binary operation {g,h) ^-^ g h from 
G X G into G, such that the following three conditions are satisfied. First, 
the operation is associative, which means that 



for all g in G, which makes precise the sense in which e is the identity element. 
Third, for each g in G there is a g~^ in G such that 



It is easy to see that g^^ is uniquely determined by g, and it is called the 
inverse of g. 

Let J^{G) denote the vector space of functions on G. More precisely, 
one can consider the real vector space of real-valued functions on G, or the 
complex vector space of complex-valued functions on G. We may write J^^{G) 
for the former and J-''^{G) for the latter, to specify one or the other. 

For each p G [1, oo], we can define a norm || ■ ||p on T{G), as in Section 
That is, we set 




{gh)k = g{h k) 



eg = ge = g 



(10.3) 



1 



= g g = e. 



(10.4) 
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when 1 < p < oo, and 

(10.5) \\fU = max{\f{h)\:heG}. 

If g is an element of G, define tlie corresponding left translation operator 
Tg : HG) ^ :F{G) by 

(10.6) LgU){h) = f{g^'h) 
for any function f{h) in T{G). Notice that 

(10.7) l|i^.(/)llp=ll/llp 

for all g m. G, for all / in T{G), and all p, 1 < p < oo. This uses the fact 
that h g~^ h defines a permutation on G for each g in G. 

In general, a norm || • || on J-{G) is said to be invariant under left trans- 
lations if 

(10.8) ll^.(/)ll = 11/11 
for aU / in T{G). 

Similarly, for each g in G, define the corresponding right translation op- 
erator Rg : T{G) J^{G) by 

(10.9) Rg{f){h) = f{hg) 
for every function f{h) in T{G). Again 

(10.10) ll^,(/)l|p = 



for all g in G, for all / in T{G), and all p, 1 < p < oo. A norm || • || on J-{G) 
is said to be invariant under right translations if 

(10.11) \\Rm\-\\f\\ 

for all / in J^{G). 
Observe that 

(10.12) LgO Lk = Lgk and RgoRk = Rgk 
for all g and k m. G. Also, 

(10.13) LgoRk^RkoLg 
for all gi, k. 
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10.2 Some operators on J^{G) 

Let a be a function on G, and consider the operators Sa, Ta on J-'{G) defined 

by 

(10.14) Sa^J2<h)Lh, T; = ^ a(/i) 

heG heG 

In other words, 

(10.15) Sa{f){g)^Y.<h)f(h-'g), Ta{f){g) = a{h) f{gh). 

heG heG 

For 1 < p < oo, we have that 

(10.16) \\Sa{f%^\\T.<h)L,{f)\\ < 

heG ^ heG 



pi 



and, similarly, 

(10-17) \\TaU)\\p< hWuuwv 

Thus, as operators on T{G) equipped with the norm || • and Ta have 

norm less than or equal to ||a||i. 

Now suppose that p = 1. Let us write 6^ for the function on G such that 
5e{e) = 1 and Se{h) — when e. Then 

(10.18) Sa{5e){k) ^ a{k) 
and 

(10.19) T,(5e)(fc) = a(0, 

as one can verify. It follows that the operator norms of Sa and Ta on J-{G) 
equipped with the norm || • ||i are equal to ||a||i, since ||5e||i = 1- 

Exercise 10.20 The Hilbert-Schmidt norms of Sa and Ta on J^{G) with 
respect to the norm || • ||2 are equal to Order(G)^/^ ||o||2, where Order(G) is 
the number of elements of G. 



10.3 Commutative groups 

In this section we assume that G is a finite commutative group, so that 
(10.21) gh = hg 
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for all ^f, h in G. We shall also make the standing assumption that J^{G) is 
the complex vector space of complex- valued functions on G. 
The norm || ■ II2 on J^{G) is associated to the inner product 



(10.22) (/i,/2) = E/iW/2W- 

This inner product is preserved by the translation operators, which is to say 
that they define unitary operators on J^{G) equipped with (/i, /2). 

As in Section |6.4| , a unitary operator on a complex inner product space 
can be diagonalized in an orthogonal basis. The translation operators are all 
unitary on J^{G) (with this choice of inner product), and so they can each be 
diagonalized in an orthonormal basis. A key point now is that the translation 
operators all commute with each other, since the group G is commutative. 
This implies that 

(10.23) there is an orthogonal basis of J^{G) in which all the 
translation operators are simultaneously diagonalized. 



as in Section |6.8| . The key point is that an eigenspace of one of the operators is 
invariant under the other operators (and one can make use of the information 
that the operators are unitary here to adjust the argument somewhat). 

What do the elements of such an orthogonal basis look like? In other 
words, if / is an element of T{G) which is an eigenvector of all of the trans- 
lation operators, then what can we say about /? One can show that / is an 
eigenvector of all of the translation operators if and only if there is a complex 
number c and a nonzero complex- valued function % on G such that f = cx 
and 

(10.24) X{9h) = x{9)x{h) 

for all g and h in G. (Exercise. Observe also that |x(5')| = 1 ^ot all g.) 

As part of the story of eigenvectors of the translation operators, one gets 
that distinct functions x on G which satisfy (|10.24|) are orthogonal in J^{G), 
using the inner product above. It is a good exercise to extract a direct proof 
of this statement. 

In this situation, operators of the form Sa and Ta from the preceding 
section are diagonalized by the same basis as the translation operators, since 
Sa and Ta are, after all, linear combinations of the translation operators. 
Specifically, if x is a nonzero complex-valued function on G which satisfies 
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(|T0:2^ ), then 

(10.25) Sa{x) = {j2^ih)xih-'))x, 

h&G 

and there is an analogous formula for Taix)- a consequence, 

(10.26) ||5'a||op,22 = maxl^ a{h)x{h'K 

^ heG 

where || ■ ||op,22 denotes the operator norm of a linear operator on J^{G) with 
respect to the norm || ■ ||2 on J^{G), and the maximum is taken over all nonzero 
complex- valued functions x oii G which satisfy ( |10.24|) . This is consistent 
with the fact that 

(10.27) \\Sa\\op,22 < ||a||i, 

mentioned in Section p,0.2| , because |x(^)| = 1 for all h in G. Note that 
||'S'a||op,22 can be strictly less than ||a||i. 



10.4 Special cases 

Let n be a positive integer, and let G be the group consisting of 0, 1, . . . , n — 1 
with the group operation given by "modular addition". In other words, if j, 
k are two integers with < j. A; < — 1, then the group operation simply 
adds j and k when j + k < n — 1, and otherwise it adds j and k and then 
subtracts n to get back within the range from to n — 1. It is not hard to 
check that this is a commutative group. 

In this case one can check that the functions x on G as in the previous 
section are of the form 

(10.28) x{j) = oi\ 

where a is a complex number which is an nth root of unity, i.e., a" = 1. 
This includes a = 1, and a's which might be mth roots of unity for an m 
which is smaller than n (and which divides n). There are exactly n such 
roots of unity, which fits with the fact that T{(j) has dimension n. These 
are versions of versions of "complex exponentials" . 

Now suppose that G is the set of /-tuples x = (xi, X2, . . . , x^) where each 
Xj is either 1 or —1. Given two such /-tuples x, define xy by 

(10.29) xy = {xiyi, X2y2, ■ ■ ■ , xiyi), 
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using ordinary multiplication in each of the coordinates. This also defines a 
commutative group. 

For i = 1,2, ... ,1, define pi on G by Pi{x) = Xi. Thus Pi{xy) = Pi{x) p{y) 
for all x,y E G. These are some of the functions that we want, but not all. 

If / is a subset of the set {1,2,..., I}, define Wj on G by 

(10.30) Wi^Hpi. 

We interpret this as Wi{x) = 1 for all x in G when I = ^. These functions 
Wi are all distinct, and all satisfy Wi{xy) = Wi{x) Wi{y). They account for 
all functions of this type. 

The functions Wi are versions of "Walsh functions" . The functions pi are 
versions of "Rademacher functions". 

10.5 Groups of matrices 

Let n be a positive integer. The group GL(n, R) consists of all invcrtible 
n X n matrices with real entries, where the group operation is of course 
matrix multiplication. 

One can identify an n x n real matrix with a linear mapping on R". The 
group GL{n, R) acts on the vector space of polynomials on R", through 

(10.31) P{x) ^ p(r-^(x)). 

That is, if P{x) is a polynomial on R" and T is an invertible linear map- 
ping on R"^, then the action of T on the polynomial P{x) is defined to be 
P{T~^{x)). If S is another invertible linear mapping on R", then the action 
of ST on P{x) is P{{ST)~^{x)) = P{T'^{S^^{x))), which is the same as 
first taking the action of T on P{x) and then taking the action of S on the 
polynomial resulting from the first step. 

If d is a nonnegative integer, then a polynomial P{x) on R" is said to be 
homogeneous of degree d if 

(10.32) P{tx) =t'^P{x) 

for all real numbers t and all x G R". This is the same as saying that one 
can write P( linear combination of monomials where 

SILi ji = d for each term. Every polynomial on R" can be written as a sum of 
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homogeneous polynomials, where the homogeneous components are unique. 
The subspaces of homogeneous polynomials are mapped to themselves by 
the action of GL{n, R). 

Now let us consider the group 0{n), which is the subgroup of G'L(n, R) 
consisting of matrices whose inverses are equal to their transposes. These 
matrices correspond to orthogonal linear transformations on R", i.e., linear 
transformations which preserve the standard inner product on R" (as in 
Section |6]^). For these matrices, the action on polynomials can be refined. 
Indeed, the polynomials \x\'^^ = {J2]^=ixf)^ cire invariant under the action of 
0{n). This action also preserves the class of harmonic polynomials, which 
are the polynomials P{x) such that 

n 02 

(10.33) E^n^) = 0. 

1=1 ^^i 

These two properties of the action of 0{n) on polynomials on R" are actually 
"dual" to each other in a natural way, as in Section IV. 2 in | SteW2|| and 



Section III. 3 of [3te2 



In a somewhat different direction, suppose that we consider GL{m,C), 
which is the group of m x m invertible matrices with complex entries. These 
matrices can be identified with invertible linear transformations on C", and 
one can also view them as defining invertible real linear transformations on 
R^™. In this way one can think of GL[m, C) as a subgroup of GL{2m, R). 

For the action on polynomials, it will be helpful to consider polynomi- 
als with complex coefficients. A "holomorphic" polynomial on is one 
that can be written as a linear combination of products of the complex 
coordinate functions zi, . . . , Zm, and these are preserved by the action of 
GL{m,C). Arbitrary polynomials, which are not necessarily holomorphic, 
can be expressed as a linear combination of products of the complex coor- 
dinate functions Zi, . . . ,Zm and their complex conjugates zi, . . . ,z^. If we 
identity with R^*", as a real vector space, then these are the usual poly- 
nomials on R^*" with complex coefficients, written in a slightly different way. 
Normally one might write these polynomials in terms of linear combinations 
of products of the real and imaginary parts of the z/s, and this is equiva- 
lent to taking linear combinations of products of the Zj's and their complex 
conjugates. 

The group GL{2m, R) acts on the full space of (not necessarily holo- 
morphic) polynomials on C" ~ R^™, and the subspaces of homogeneous 
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polynomials of various degrees are preserved by this action, as before. For 
GL{m, C), the subspace of holomorphic polynomials is also preserved, and, 
more generally, the subspace spanned by monomials which are products of a 
Zj^s and b z^'s is preserved for each a and b. 

Inside GL{m, C) there is the subgroup U{m) of matrices whose inverse is 
given by their adjoint (the complex conjugate of the transpose). These are 
the matrices corresponding to linear transformations on C™ that are unitary, 
which is to say that they preserve the standard Hermitian inner product on 
C", as in Section |6^ . If we view GL{m, C) as a subgroup of GL{2m, R), as 
above, then U{m) is the same as the intersection of GL{m, C) with 0{2m) 
(as one can verify). 

These groups and their actions on polynomials on R", C'" (and some 
other vector spaces) are fundamental in mathematics. In particular, noncom- 
mutativity shows up. This is reflected in the limited number of 1-dimensional 
spaces of polynomials which are preserved by the various actions. 



Chapter 11 



Some special families of 
functions 



11.1 Rademacher functions 

Given a positive integer j, the jth Rademacher function rj{x) is the function 
on the unit interval [0, 1) defined by 

(11.1) rj{x) = (-1)^ when x e [i 2"^ (i + 1) 2'^), 

where i runs through all nonnegative integers such that i + 1 < 2"^ . 
Here are some basic properties of rj{x): 

(11.2) |rj(a;)| = 1 for all X e [0,1); 

(11.3) rj{x) is constant on dyadic subintervals 
of [0, 1) of length 2"^'; 

(11.4) /r,(a;)cia; = for all dyadic subintervals 
L of [0, 1) of length greater than 2~^ . 

The latter comes from the way that rj{x) alternates between 1 and —1 on 
dyadic intervals of length 2~^ . 

Lemma 11.5 Suppose that ji,j2, ■ ■ ■ ,jn CL'f^ distinct positive integers. Then 
(11.6) / rj^{x)rj^{x)---rj^{x)dx^Q. 

J [0,1) 
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To see this, we may as well assume that jn is the largest of the j'/'s. This 
implies that each of the other rj^'s is constant on dyadic subintervals of [0, 1) 
of length 2~^"~^^, as in ( |11.3D . However, the integral of rj^ over any dyadic 
subinterval of [0, 1) of length 2^-^"+^ is 0, by ( p.l.4| ), and the same must be true 
for the product of all of the rj^ 's, since the others are constant on these dyadic 
subintervals. One can combine these integrals for all dyadic subintervals of 
[0, 1) of length 2"^"+^ to get ([TOI) . 

Proposition 11.7 For each positive real number p, there is a constant C {p) > 
so that the following is true: 



;ii.8) cip)-' (E 



1/2 



< 



< 



„ lit 

m 

c{p) (E 



dx 



i/p 



1/2 



for all positive integers m and all choices of real numbers ai, 02, 

When p = 2 one can take C{p) = 1, i.e., 

^ 2 



ar 



;ii.9) 



dx 



El"il 



This follows from the fact that the r^'s are orthonormal functions, with re- 
spect to the usual integral inner product. The orthonormality of the r^-'s can 
be obtained from (|11.2|) and Lemma |1 1 . 5| . 

Next, let us recall that if p and q are positive real numbers such that 
p < q, then 



;ii.io) 



[0,1) 



\f{x)\'dx 



i/p 



< 



[0,1) 



\f{x)\'dx 



1/g 



for any function f{x) on [0, 1). This is a consequence of Jensen's inequality 
(|3.23|) . Thus we get that 



(11.11) 

when p < 2, and 
(11.12) 



„ m 

i[o,i)lg"^"^-^"^ 



dx I 



\ i/p 



i=i 



1/2 



(El«il 

i=i 



1/2 



< 



„ m 



X 



dx 



i/p 
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when p > 2, by ([TOI ). 

Now let us show that for each p > 2, there is a constant C{p) > so that 



:ii.i3) 



„ m 



X 



dx 



i/p 



< 



fIL 

cip) (E 



1/2 



for all ai, a2, . . . , am G R- It is enough to do this when p is an even integer, 
because of ( |11.1CI|) (which implies that (|11.13| ) holds when p is replaced by 
any smaller number when it holds for p) . 

When p is an even integer, one can expand out 



(11.14) 

algebraically, into a p-fold sum 
(11.15) 



EE 

ii=i j2=i 



E 



■rj^{x} 



When this is integrated in x over [0, 1), most of the terms drop out, because 
of Lemma p.l.5| . In fact, this will be true whenever there is an integer j which 
occurs among ji, j2, • • • , jp an odd number of times. Indeed, the product 



:ii.i6) 



rj^{x)rj^{x)---rj^{x) 



can be reduced to one in which no r/(x) occurs more than once, since ri{xY = 
1 for all /. If there is an integer j which occurs an odd number of times among 
ji) ^2, • • • , jp, then the reduced product will be nontrivial (containing a factor 
of rj(a;)), and the integral of it in x over [0, 1) vanishes, by Lemma |11.5 . 



Thus one is left with the terms in which every integer j occurs among 
ji, j2, ■ ■ ■ ,jp an even number of times (zero times for some j's). In this case 
the product ( |1 1.161 ) reduces to the function which is identically 1 on [0, 1), 
and whose integral over [0, 1) is simply 1. The product of the coefficients 



;il.l7) 



■ a 



3p 



becomes a product of squares of a/s, for p/2 j's (which may include repeti- 
tions of some j's). 
In the end, 

m, 

P 



11.18 



p lib 



dx 
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becomes a sum of products of p/2 a|'s. It is not hard to bound this by a 
constant times 

(11.19) (Ei«.'^''' 



which is exactly what is needed for ( |11.13 ). 

It remains to show that for each p < 2 there is a constant C{p) > so 
that 



dxj 



lit 1/9 / r 

(11.20) c(|,)-^(EKiy <(/ JE«.-.| 

We can derive this from the p > 2 case, as follows. 

Fix p G (0, 2). For any function f{x) on [0, 1), we have that 

(11.21(S/ \f{x)\'dxy^' <([ \f{x)\'dxY'U[ \f{x)\^dx) 



{l-a)/p 



where a e (0, 1) is chosen so that 1/2 = a/4 + (1 — a)/p- This is a standard 
consequence of Holder's inequality, which one might rewrite as 



'11.22 



' \f{x)\'dx<U {\f{x)\'ydxf'U {\f{x)\''^'-^^rdx) 

[0,1) ^"'[0,1) ^ ^"'[0,1) ^ 

where l/q = a/2 and 1/r = 2(1 — a) /p. 

Suppose now that f{x) = J^JLi ^ji^)^ in ( |11.2CI| ). From ( p.l.l3|) with 
p replaced by 4 and (|11.9|) we have that 



l/r 



(11.23) 



\f{x)\Uxy^'<C{A)([ \f{x)\'dx 

[0,1) ^ ^"'[0,1) 



We can combine this with (|11.21|) to obtain 



;il.24) 



\fix)\'dxf"'^^'<CiAr([ \fix)fdx 

' ^"'[0,1) 



[0,1) 



1/2 



(l-a)/p 



This and ([TOI) imply (|Tr20D, with C(p) = C(4)"/(i-"). This completes the 
proof of Proposition [L1.7| . 



11.2 Linear functions on spheres 

Let n be a positive integer, and let S"^^ denote the unit sphere in R", i.e., 
(11.25) S"~^ = {x G : |a;| = 1}, 
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where denotes the standard Euchdean norm of x, \x\ = {Y17=i^'iY^'^- 

Let £(8""^) denote the set of functions on S"~^ which are the restrictions 
to S"~^ of hnear functions on R"^. Specifically, a linear function on R" is a 
function f{x) which can be given by f{x) = {x, v), where v E IV^ and {u, w) 
is the standard inner product on R", 

n+l 

(11.26) {u,w) = '^UiWi, u,weIV. 

1=1 

Thus £(S"^^) is a real vector space of dimension n. 

If p is a positive real number and f{x) G £(S"~^), consider the quantity 

(11.27) / \f{x)\^dxY^', 



where dx is the standard volume element of integration on S"~^ and Vn-i is 
the total volume of S"^^. (When n = 1, S*^ = {1,-1}, the integral above 
should be replaced by the sum over these two points, and z/q = 2.) 

We may suppose that /(x) is given as t{x,u), where u G R" satisfies 



\u\ 



1 and t G R. Then ( |11.27|) reduces to 



(11.28) \t\ [ \{x,u)fdxY^'' 

A key point now is that this integral does not depend on u. In other words, 
(111.271 ) is equal to 



;il.29) \t\ I \xi? 



dx 



1 



where Xi denotes the first coordinate of x. This is because the volume element 
of integration on S""^ is invariant under orthogonal transformations on R"', 
which can be used to change from u to any other unit vector. Here we are 
using the first standard basis vector (with first coordinate 1 and the rest 0). 

Thus we may conclude that if pi, p2 are positive real numbers, then there 
is a positive constant C{pi,p2,n) such that 



:ii-3o) (^/ \f{x)\ 



C{puP2,n)( / \f{x)rdx) 



1/P: 
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for all / e £(S'^-i). 

Concerning the p = 2 case, let us note the orthogonality conditions 

(11.31) / XiXjdx = when I < i, j < n, i ^ j. 

11.3 Linear functions, continued 

Let n be a positive integer, and consider expressions of the form 

(11.32) (-/ \fiy)\Pe-\y\'dyY^\ 
where < p < oo, /(y) is a function on R", and 

(11.33) /i„= / e-\y\" dy. 

Suppose that /(x) is a linear function on R", /(x) = t{x, u), where t is a 
real number and u is a unit vector in R". Then ( |11.32[ ) is equal to 

- / \{y,u)re cty'^^^ 

As in the previous section, one can apply rotation invariance to obtain that 



;il.34) \t\(— f \{y,u)\^e-^y^'dy 



this integral does not depend on the choice of unit vector u. Thus ( |11.32| ) is 
the same as 

i/p 



;il.35) \t\(— [ \y,\P e-\y\' dy) 



We also have that 
;il.36) — / \yx\'' e-^y^^ dy = — \ \z\^ t-\^\^ dz. 



This implies that (|11.32| ) is equal to 



(11.37) \t\(- f \zr e-\^\' dzY" 



Hence we obtain that if pi, p2 are positive real numbers, then there is a 
positive constant Ci{pi,p2) such that 

l/lyjre dy_ 



;il.38) (-/ \f{y)re~\y\''dy 



Ci{puP2)(- \f{y)re-^y^'dy) 
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for all positive integers n and all linear functions /(,x) on R". 

Note that for each positive real number p and each function h{y) on R" 
which is homogeneous of degree p — so that h{ry) = h{y) for all y e R" 
and r > — the integral of h on S"~^ is equal to a constant times the 
integral of h{y) times e"'^' on R", where the constant depends on p and n 
but not on h{ii). (This works as well for p > —n, and for other intcgrable 
radial functions besides e"'^' .) In particular, this applies to h{y) — \f{y)\^ 
when f{y) is linear in y. 

Let us also note the orthogonality conditions 

(11.39) / yiyj e~'^'^ dy — when 1 < i, j < n, i j. 

11 A Lacunary sums, p = 4 

Let n be a positive integer and a_„, a_„+i, . . . , a„_i, a„ be complex numbers, 
and consider the function 

(11.40) f{x)= J2 a,exp{27rtjx) 

j=-n 

on [0, 1] (or as a periodic function on R, with period 1). Here exp^; denotes 
the usual exponential function, also written e^. Recall that 

(11.41) / \f{x)\'dx= E |a,f. 

j=-n 

This follows from the orthonormality of the functions exp(2 nijx) with re- 
spect to the inner product 

(11.42) {g,h) = f\{x)h{x)dx. 

Jo 

This orthonormality itself reduces to the fact that 

(11.43) / exp{2 TT i k x) dx — 1 when A; = 
Jo 

= when k 0. 

(For k ^ 0, note that exp(2 7ri A; x) is the derivative of (2 7ri A;)~^ exp{2 irikx).) 
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Now let us look at a kind of lacunary sum, of the form 



m 



(11.44) 0(x) = J2^j exp(2 7ri2^ x), 

3=0 

where m is a positive integer and ci,...,Cm are complex numbers. From 
( |11.41| ) we have that 

(11.45) / |0(x)rrfa; = E'"" 

JO ~n 



j=0 

Consider ^ 

(11.46) / \(j){x)\Ux. 

Jo 

We can write 



(11.47) |0(x)|^ = 0(a;)V(x) 
and multiply out the sums to obtain 

(11.48) |0(x)r = 

m m m m 

5Z H 5Z H ci C2 cici exp(2 71 i (2^'i + - 2^' - T^) x). 

jl=0 j2=0 j3=0 i4=0 

Hence 

r-l 



;il.49) / |0(x)|^rfx = 
Jo 

E{ci C2 cici : < ji, j2, J3, J4 < m, 2^^ + 2^'^ - 2^' - 2^' = 0}. 



Lemma 11.50 Suppose that ji, j2; js, Ja o,re nonnegative integers such that 
2^1 -f 2-'2 — 2-'=^ — 2-''' = 0. Then either ji = J3 and j2 = Ja, or ji = J4 and 

32 = J3 r^"^ both, so that ji = J2 = J3 = j4y'- 

(Exercise.) 

Using the lemma we obtain that 
(11.51) / mx)\Ux <2{Y. I9J^) (E 1^/) = 2 (E • 



jl=0 i2=0 i=0 

This can be rewritten as 

[''\Hy)\'dy' 

lo ^ Jo 

There are numerous extensions and variants of these results. 



11.52) / \<p{x)\Ux<2n imi'dyf. 



Chapter 12 
Maximal functions 



12.1 Definitions and basic properties 

Let / be a function on [0, 1). Define the dyadic maximal function M{f ) on 
[0, 1) by 

(12.1) M(/)(x)=sup|E,(/)(x)|, 

fc>0 



where Ek{f) is as in Section |2.2| . More precisely, the supremum in (|12.1|) is 



taken over all nonnegative integers k. 

This definition of M(/) is equivalent to setting 



:i2.2) M{f)ix) = snp(^\^ I J {y)dy 



: J is a dyadic subinterval 
of [0, 1) and x E J>. 



The equivalence of ( |12.1| ) and ( |12.2D is not hard to check, since Ek{f) is 
defined in terms of averages over dyadic intervals, as in (|2.7|) , and all dyadic 
intervals arise in this manner. If J is a dyadic subinterval of [0, 1), then its 
length is of the form 2~^ for some nonnegative integer k, and this gives the 
correspondence between the J's in ( p.2.2| ) and the /c's in (|12.1|) . 
Given a nonnegative integer /, define Mi{f) on [0, 1) by 

(12.3) Mi{f){x)= sup \Ek{f){x)\. 

0<k<l 

As above, we are implicitly restricting ourselves to /c's which are integers 
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here. This is equivalent to taking 



:i2A)Mi{f){x) 



sup 



u\ I ^^y^ 



J is a dyadic subinterval of 
[0,1), X G J, and |J| > 2-'|. 



Mi{f)<Mp{f) whenp>/ 



Observe that 
(12.5) 

and 

(12.6) M(/)(x) = supMz(/)(x) for all x G [0,1). 
For any pair of functions /i, /2 on [0, 1), 

(12.7) M(/i + /2)<M(/i) + M(/2) 



and 
(12.8) 



MK/1 + /2) <MK/i) + MK/2) 



for alH > 0. Also, 

(12.9) M(c/) = |c| M(/), MKc/) = |c| M^/) 



when c is a constant. In other words, ( 12.7 ), ( 12.8|) , and ( 12. 9| ) say that M(/) 
and Mi{f) are sublinear in /. 

Lemma 12.10 /// is constant on the dyadic subintervals of [0, 1) of length 
2~\ then M{f) is also constant on the dyadic subintervals of [0, 1) of length 
2~\ and M{f) = Mi{f). For any function f , 

(12.11) MK/) = M(EK/)), 

and Mi{f) is constant on dyadic intervals of length 2^K 

(Exercise. One can use part (c) of Lemma ^.9| .) 

Corollary 12.12 /// is a dyadic step function, then so is M{f), and M{f) = 
Mj{f) for sufficiently large j . 
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12.2 The size of the maximal function 

Let / be a function on [0,1), and let M(/) be the corresponding dyadic 
maximal function, as in the preceding section. 

What can we say about M(/)? How does it compare to /, in terms of 
overall size? In other words, if / is not too large, can we say that M(/) is 
not too large? 

Lemma 12.13 (Supremum bound for M(f)) Suppose that A is a non- 
negative real number such that \f{x)\ < A for allx G [0, 1). Then M{f ){x) < 
A for allx G [0,1). 

This is an easy consequence of the definitions. If |/(a;)| < A for all 
X e [0, 1), then all averages of / have absolute value less than or equal to A. 
This implies that M{f) < A at all points in [0, 1), since M{f) is defined in 
terms of suprema of absolute values of averages of /. 

This would also work if we assumed that |/(x)| < A holds on [0, 1) except 
for a very small set, like a finite set, a countable set, or, more generally, a 
set of measure 0. That is, the behavior of / on a small set like this would 
not affect any of the integrals of /, and so one could just as well replace the 
values of / with on such a set. 

Proposition 12.14 (Weak-type estimate for M(f)) For each positive real 
number X, 

(12.15) \{x e [0, 1) : M(/)(x) > A}| < | / \f{w)\ dw. 

A J[o,i) 



The left-hand side of ( |12.15 ) refers to the measure of the set in question. 



and the proof of the lemma will give a very simple meaning for this. 

Let A > be given. Define to be the collection of dyadic intervals L in 
[0, 1) such that 

(12.16) l^[jiy)dy 
Let us check that 



L Jl 



> A, 



(12.17) {xG[0,l):M(/)(x)>A}= (J L. 

her 

If x G [0, 1) and M{f){x) > A, then there is a dyadic interval L in [0, 1) such 
that X E L and L satisfies (|12.16| ), because of (|12.2|) . This shows that the left 
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side of ( |12.17| ) is contained in the right side of ( |12.17| ). Conversely, if L G JF, 
then 

1 



12.18 



M(/)(x) > 



\L\ 



f{y) dy 



> A 



for all X E L. This shows that L is contained in the left side of (|12.17|) . Hence 
the right side of ( |12.17 ) is contained in the left side. This proves ( 12.17 ). 

We may as well assume that the left side of (|12.17|) is not empty, since 
otherwise ( p.2.15| ) is automatic. This implies that is nonempty as well. 
Now we apply Lemma p.5| to obtain a subcollection JFq of such that 

(12.19) \jL=[jL 

and the intervals in Ti are pairwise disjoint. Combining these properties with 

E 1^1- 



12.171) , we obtain that 

|{a;e[0,l):M(/)(a;)>A}| 



(12.20) 



For the proof of (|12.15| ), we do not need equality in ( |12.20| ), but only the 
inequality <. This inequality does not require the disjointness of the intervals 
in JFq, but this will be needed in a moment. 

Remark 12.21 If / is a dyadic step function, then the set {x G [0, 1) : 
M{f){x) > A} can be given as a union of finitely many dyadic intervals. 
More precisely, one could use only intervals L with size \L\ > for some 
j, because of Corollary |12.12| . 

Now let us look at the right side of ( 12.20|) . Each interval L in the sum 
satisfies ( |12.16|) , which we can rewrite as 



(12.22) 

Hence 

(12.23) 



li^l < 



f{y) dy 



E W< E 1 



f{y) dy 



< 



i:-Jmy)\dy 

\f{y)\dy. 



1 



XJ[j 
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The last step uses the disjointness of the intervals L in J^q. 

The combination of ( 12.20| ) and ( 12.23|) implies ( |12.15| ). In fact, we get 



the slightly more precise inequality that 

(12.24) \{xe[0,l):M{f){x)>\}\< \ f _ \f{y)\dy, 
using ([I2T7D and (^Tl^). 



12.3 Some variations 

Instead of dyadic intervals (and averages of functions over them, etc.), one 
can also look at arbitrary intervals in the real line. For these, it is not as 
easy to have disjointness, or reduce to that case. However, there is a simple 
substitute, indicated by the following lemmas. 

Lemma 12.25 (3 intervals to 2) Suppose that I, J, and K are intervals 
in the real line such that there is a point a; G R which lies in all three of 
them. Then one of the intervals is contained in the union of the other two. 

This is not hard to check. One of the intervals will go as far as possible 
to the left of x, another of the intervals (perhaps the same one) will go as 
far as possible to the right of x, and the union of these two intervals (which 
may be the same interval) will contain all of / U J U K. 

Lemma 12.26 (No point counted more than twice) Let A be an arbi- 
trary finite collection of intervals in R. There is a subcollection Ai of A with 
the following properties: 

(12.27) \J J=[jJ: 

(12.28) no point in R belongs to more than two intervals in Ai. 

This can be derived from Lemma [12.25| . Specifically, one starts with A 
itself, and one leaves it alone if it already satisfies ( 12.28| ). If not, there is a 



point in R which lies in 3 intervals in A., and one can throw away one of the 



intervals without changing the total union, by Lemma |12.25| . One repeats 
this process until there are no points in R which lie in 3 intervals remaining 
in the collection (i.e., that have not been thrown away). This has to happen 
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eventually, since we assumed that A is finite. The resulting collection can be 
used for Ai. 

One can rephrase ( |12.28| ) in terms of indicator functions, as follows: 
(12.29) V lj(x) < 2- 1/, I .^{x) forallxeR. 

If / is a nonnegative function on R, one can integrate (|12.29D with / to get 



12.30) I f{x)dx<2 [ f{x)dx. 



This could be used in a setting like that of the last step in ( |12.23|) . Of course 
(12.31) I fi^)dx> f{x)dx 



JeAi 



holds automatically (so that the equality in the last step in ( |12.23| ) corre- 
sponds to these two inequalities, going in opposite directions). 

These observations are special to dimension 1, although there are well- 
known substitutes for other settings (like R", n > 1). 

In R", n > 1, one can also consider dyadic cubes, which behave in much 
the same manner as dyadic intervals in R. A dyadic cube in R" is a Cartesian 
product of n dyadic intervals in R of the same size. As above, one can focus on 
dyadic subcubes of the unit cube, which is the Cartesian product of n copies 
of the unit interval [0, 1). (This is not a serious restriction, however.) With 
dyadic cubes, one has much the same kind of disjointness and partitioning 
properties as for dyadic intervals. 

12.4 More on the size of the maximal func- 
tion 

Let / be a function on [0, 1), as before. 
Proposition 12.32 For each real number p > 1, 

(12.33) / MU){xYdx<^f \f{y)\^dy. 

"'[0,1) p—l J[0,1) 



12.4. MORE ON THE SIZE OF THE MAXIMAL FUNCTION 



103 



To prove this, we shall use Lemma |12.13| and Proposition |12.14| . In fact, 
we shall only need the inequalities in these lemmas, together with the fact 
that M(/) is sublinear in /, as in QT^ ) and (|T^. 

Let us first state and prove two lemmas that we shall use (and which are 
of broader concern) . 

Lemma 12.34 For each A > 0, 

(12.35) |{xG[0,l):M(/)(x)>2A}|<| / \f{u)\du. 

A J{ue[o,iy.\f{u)\>\} 

This is analogous to (|12.15|) in Proposition |12.14| , except that we have 
replaced A by 2 A on the left side of the inequahty (making it smaller) , and we 
have restricted the domain of integration on the right side. (The appearance 
of this factor of 2 will come from simple arithmetic, rather than depending 
on the details of the situation, e.g., as with other settings as in Section |12.3| .) 

Let A > be given, and define the functions fi{x), f2{x) on [0, 1) from 
the function f{x) by setting 

(12.36) /i (a;) = f{x) when \f{x)\ < A, f,{x) = when \f{x)\ > A, 
and 

(12.37) /2(a;) = f{x) when |/(x)| > A, Mx) = when |/(a;)| < A. 
Thus / = /i + h. 

Because |/i(x)| < A for all x, we have that M(/i)(x) < A for all x as well, 
as in Lemma |12.13| . This implies that 

(12.38) M{f){x)<X + M{f2){x) 

for all X G [0, 1), since M(/) < M(/i) + M(/2), as in ([T27|) . Therefore 

(12.39) {x e [0, 1) : M(/)(x) > 2 A} C {x G [0, 1) : M(/2)(x) > A}, 
and hence 

(12.40) |{xG [0,1) :M(/)(x) >2A}| < |{x G [0, 1) : M(/2)(x) > A}|. 

On the other hand, we can apply Proposition |12.14| with / replace by /2 
to get that 

(12.41) \{x G [0, 1) : M(/2)(x) > A}| < ^ / |/2(n)| du. 

A "'[0,1) 
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The right side of this inequahty is equal to the right side of ( |12.35| ), by the 
definition of /2, and so ( 12.35| ) follows from this and ( |12.40|) . This proves 
Lemma |12.34| . 

Lemma 12.42 Let g{x) be a nonnegative real-valued function on [0,1), and 
let p be a positive real number. Then 

(12.43) / g{xYdx= pW~^\{xe[QA)--9{x)>\]\dX. 

This is not at all special to the unit interval, but works in a very general 
way. 

There is a nice (and well-known) geometric way to look at (|12.43|) . Con- 
sider the set 

(12.44) {(x. A) e : a; G [0, 1), < A < g{x)}. 

We want to integrate the function pA^~^ on this set. (This function does 
not depend on x, but the region on which we are integrating does.) If we 
integrate first in A, and afterwards in x, we get the left side of (|12.43| ). If 
instead we integrate in x first, and afterwards in A, then we obtain the right 
side of (|12.42|) . This proves Lemma |12.42| . 

Now let us apply this formula to the proof of Proposition |12.32| . Thus we 
obtain that 



(12.45)/ M{f){xYdx= p\^-^\{xE[QA)--M{f){x)>X}\d\. 
J[o,i) Jo 

If we apply ( |12.35| ) to the right side (with A replaced by A/2), then we get 

(12.46/ M(f)(xYdx < r pXP'^d f \f(u)\du]dX. 
j[o,i) Jo VA J{ue[o,iy.\fiu)\>\/2} / 

Let us rewrite this as 

(12.47)/ M{f){xYdx< 2p\P-^ \f{u)\dud\. 

J[0,1) Jo J{«G[0,l):|/{«)|>A/2} 



If we interchange the order of integration on the right side, we obtain 

r2|/(«)| 

'[0,1) ^[0,1)^0 



;i2.48) / M{f){xYdx< I r^^""^^ 2p\P-^\f{u)\d\du. 
J\o.\:) Jio.i) Jo 
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Because p > 1, by assumption (in Proposition [12.32| ), this reduces to 



(12.49) / M{f){xrdx < / 2p{p-l)-\2\f{u)\y-'\f{u)\du 

J[0,1) J[0,l) 



2Pp 



p ~ 1 J[0,1) 



\fiu)\Pdu. 



This gives ( |12.33|) , as desired 



The argument used here is an instance of Marcinkeiwicz interpolation of 
operators. Variants of this will be employed in Chapter 0. 



Chapter 13 
Square functions 



13.1 /S- functions 

Let / be a function on [0, 1). Define the associated square function S{f) on 
[0, 1) by 

OO I/O 

(13.1) S{f){x) = {\Eo{f){x)\' + Y: \Em^) - , 

where Ej{f) is as in Section [27^. For each nonnegative integer I, define Si{f) 
on [0, 1) by 



(13.2) 5,(/)(x) = {\Eo{f){x)\' + Y: \Em^) - E,-,Um\- 



If / = 0, then the sum in ( |13.2| ) is interpreted as being 0. Clearly 

(13.3) Si{f){x) < Sp{f){x) when / < p, 
and 

(13.4) Sif)ix) = snp Si{f){x), 

l>0 

where the supremum is implicitly taken over nonnegative integers I. 
These square functions are sublinear in /, i.e., 

(13.5) 5(/i + /2)(x) < S(/i)(x) + 5(/2)(x), 

(13.6) Siih + f2){x) < Si{h)ix) + Siimx) 
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and 

(13.7) Sicf)ix) = \c\ Sif)ix), Siicf) = \c\ Siif)ix) 

for all functions /, /i, /2, all constants c, and all nonnegative integers /. This 
follows from the fact that {J2j kjP)^^^ defines a norm on sequences {aj}. 

Lemma 13.8 // / is constant on the dyadic subintervals of [0, 1) of length 
then Si{f) is also constant on the dyadic subintervals of [0, 1) of length 
2~\ and S{f) = Si{f). For any function f , 

(13.9) Siif) = S{Ei{f)), 

and Si{f) is constant on dyadic intervals of length 2~K 

(Exercise. One can use Lemma p.9| .) 

Corollary 13.10 If f is a dyadic step function on [0,1), then so is S{f), 
and S{f) = Sj{f) for sufficiently large j . 

Lemma 13.11 /// is any function on [0,1), then 

(13.12) / Si{f){xfdx=l \Ei{f){x)\^ dx 

J[0,1) J[0,1) 

for all I > 0, and 

(13.13) / S{f){xfdx= I \f{x)\^dx. 

J[0,1) J[0,1) 

As usual, one should at least assume that / is mildly well-behaved, such 
as being integrable. 

The main point behind Lemma |13.11| is simply that the functions -E'o(/)) 



Ej{f) — Ej_i{f), j G Z+, are all orthogonal to each other (with respect to 
the usual integral inner product). This is not hard to check, using the fact 
that Ej{f) — Ej^i{f) has integral over every dyadic subinterval of [0, 1) of 
length 2~^~^'^, and that E^if) is constant on dyadic intervals of length 2"*^. 
One also uses the formula 

(13.14) Eiif) = Eoif) + jZ{E,{f) - i?,-i(/)). 

Remark 13.15 If f{x) is a sum of constant multiples of Rademacher func- 
tions (Section |11.1|) , then S{f){x) is constant, and S{f){x)'^ is the sum of the 
squares of the absolute values of the coefficients of the Rademacher functions. 
Of course, in general, S{f){x) is not constant. 



108 



CHAPTER 13. SQUARE FUNCTIONS 



13.2 Estimates, 1 



Proposition 13.16 Let p be a positive real number, p < 2. There is a 
constant Ci{p) > so that 



(13.17) 

for any function f on [0, 1). 



S{f){xydx<Ci{p) M{f){xYdx 

[0,1) ^[0,1) 



Let p < 2 and / be given. Also let A > be given. We shall try to get 
an estimate for 

(13.18) \{xe[0,l):S{f){x)>X}l 

and use that to analyze the integral of S{f){xy. 

Let denote the set of dyadic sub intervals J of [0, 1) such that 



(13.19) 



1 

\J\ 



f{y) dy 



> A. 



If is empty, then this will be fine for our eventual conclusion, and so we 
shall assume that J-' is not empty. Define !Fo to be the set of maximal intervals 



in JF. As in Lemma |2]^, we have that 
(13.20) 



u ^= u ^ 



and 
(13.21) 



Ji n J2 



when Ji, J2 e J^o, Ji ^ J2. 



Assume that [0, 1) is not an element of JFq. If it is an element of jFg, then 
this will also be fine for our eventual conclusion. Let J-'i denote the set of 
dyadic subintervals L of [0, 1) such that there is a J G JFq such that 



(13.22) 



J C L and \ J\ = \L\/2. 



Because JFg consists of maximal intervals in JF, each L in J-'i does not lie in 
JF. Therefore, 

(13.23) ±.\lj(y)dy\<\ 
for all L G JFi. 
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The elements of J-'i need not be disjoint, and so we let J-'io be a subcol- 
lection of J-'i as in Lemma |2.5| . Thus 

(13.24) U ^ = U ^ 
and 

(13.25) LinL2 = when Li, L2 e J^io, Li ^ L2. 
Define a function fx{x) on [0, 1) by 

(13.26) /a(x) = 777/ f{y)dy when x G L, L G J^io, 

|L| Jl 

= f{x) when X G [0,1)\(' U l\ 

Lemma 13.27 Let K he a dyadic suhinterval of [0, 1) such that K contains 
a point X in [0, 1)\ ^UiejCy, L^j . Then 

(13.28) j f{u)du=-^ j h{u)du. 

\K\ JK \K \ JK 

(The same is true if K contains an element o/J^io as a suhinterval.) 

(Exercise. Under these conditions, K is equal to the union of the intervals 
L & J^Q such that L C K and the set K \ {JLe^o L, and these sets are all 
disjoint.) 

Corollary 13.29 If x G [0,l)\([jLe^^, , then Sif){x) = S{fx){x). 

Using this corollary we get that 

(13.30) {x G [0,1) : 5(/)(x) > A} 

C ( U l)u{xg[0,1):^(/a)(x)>A}. 

Hence 

(13.31) |{xG[0,l):S(/)(x)>A}| 

< Yl \L\ + \{xe[0,l):S{f,){x)>\}\. 



110 



CHAPTER 13. SQUARE FUNCTIONS 



For each L G J-'io, there is an interval J G JFg such that J C L and \J\ > 
\L\/2. The intervals in J-'io are disjoint, and this implies that each J G JFg is 
associated to at most one L G J-'io- This implies that 



(13.32) 



E \L\<2Y.\J\. 



The J's are disjoint, 


so that 








(13.33) 




U J 


= 2 















where the last step uses (|13.20|) . 
We also have that 



(13.34) 



U J={xG[0,l):M(/)(x)>A}. 



See ( 12.17|) ; at the moment, we really only need that the left side is contained 
in the right side, which one can easily get from ( |12.2| ). From this we obtain 
that 



(13.35) 

and hence 
(13.36) 



U J =|{xG[0,l):M(/)(a;)>A}|, 



E |L|<2|{a;G[0,l):M(/)(x)>A}|. 



Plugging this into ( |13.31D leads to 



;i3.37)|{a;G[0,l):5(/)(a:)>A}| 

< 2 |{x G [0, 1) : M(/)(x) > A}| + G [0, 1) : S{h){x) > A}|. 



The first term on the right side of (|13.37| ) is fine for the goal of get- 
ting estimates in terms of M(/). For the second we apply the Tchebychev 
inequality and Lemma 13.11| , to obtain 



(13.38) A2|{xG[0,1):S(/a)(x)>A}| < 



[0,1) 



[0,1) 



S{fx)ixfdx 
\fxix)\'dx. 
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Thus 

(13.39) \{xe[0,l):Sif)ix)>X}\ 

< 2 G [0, 1) : M(/)(a;) > A}| + / \Mx)\' dx. 



Lemma 13.40 |/A(a;)| < mm(A, M(/)(a;)) (at least almost everywhere). 



(Exercise. The main point is that 



(13.41) 



fiv) dy 



is always less than or equal to M{f ){x) when a; G L, as in ( |12.2|) , and that 
it is less than or equal to A if L G (and hence if L G J-'io), or if x G L and 

X G [0, l)\^lJLGjfio Indeed, these conditions on L imply that L ^ JF.) 

(Actually, one also has that M{f\){x) < min(A, M(/)(x)) for all x G 
[0, 1). This is not hard to check.) 

Using the lemma, we may replace ( 13.39| ) with 



:i3.42)|{xG[0,l):5(/)(a;)>A}| 

< 2 |{a; G [0, 1) : M{f){x) > X}\ + X'^ [ min(A, M{f){x)f dx. 



In deriving ( |13.42| ), we made two technical assumptions. The first was 
that JF ^ 0. If JF = 0, then the first term on the right side of ( |13.42| ) is equal 
to 0. In this case, we can take fx = /, and the argument works in the same 
manner as before (and is simpler). (Note that M{f){x) < X for all x G [0, 1) 
in this case.) The second assumption was that [0, 1) is not an element of JFq. 
If [0, 1) is an element of JFq C JF, then 



;i3.43) 



[0,1) 



f{y) dy 



> A, 



and {x G [0,1) : M{f){x) > A} = [0,1). In this situation, ([13:^^ ) holds 
automatically, with neither the second term on the right side nor the factor 
of two in the first term on the right side being needed. To summarize, ( |13.42| ) 
in fact holds without these two assumptions. 
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By Lemma |12.42|, we have that 



(13.44) / S{f){xydx= pXP'^\{x e [0,1) : S{f){x) > X}\dX 

J[Q,l) Jo 



and 



(13.45)/ M{f){xYdx= p\P-^\{xe[0,l)--M{f){x)>X}\d\. 
J[o,i) Jo 



Integrating ( |13.42D in this way, we obtain that 



(13.46)/ S{f){xYdx 

J[0,1) 

<2 M{f){xYdx+ p\P-'^ mm{X, M{f){x)fdxd\. 
J[o,i) Jo J[0,1) 

Let us interchange the order of integrations in the second term on the 
right side of ( |13.46| ). This leads to 



(13.47) 



p XP-^ min(A, M(/)(x))2 dX dx. 



i[o,i) Jo 

The integral in A here can be computed exactly. Specifically, we have that 

p 



(13.48 



) r pXP-^MU){xfdX = ^M{f){x) 

JM(f)(x) p-2 



p-2 



Note that it is important that p < 2 here, for the convergence of the integral. 
Also, 

(13.49) / pXP-^X'^dX = M{f){xy. 



Plugging these formulae into ( |13.46D , we obtain ( |13.17| ), as desired. 

The constant that we get here blows up as p — 2, which is silly, since 
we know that the p = 2 case behaves well, as in Lemma |13.11| . This is an 
artifact of the argument, and one can get rid of it using interpolation. For 
this it is convenient to consider the analogue of ( |13.17| ) with M(/) replaced 
by I/I (for p > I, such as p = 3/2). 
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13.3 Estimates, 2 

Proposition 13.50 Let p be a positive real number, p < 2. There is a 
constant C2{p) > so that 



(13.51) / M{f){xYdx<C2ip) SifMxYdx 

-'[0,1) -'[0,1) 

for any function f on [0, 1). 



The argument for this is very much analogous to the one in Section |13.2 . 
Let p < 2 and / be given, and let A > be given too. As before, we 
would like to get a helpful upper bound for 

(13.52) |{xG[0,l):M(/)(x)>A}|. 



Lemma 13.53 The set {x G [0,1) : S{f){x) > X} is a union of dyadic 
subintervals of [0, 1). 

Indeed, suppose that w G [0,1) and S{f){w) > A. This is the same as 
saying that 

CO , ,„ 

(13.54) {\Eo{f){w)\' + Y: \E,{f)H - ^.-i(/)HI') > A, 

i=i 

by ( |13.1|) . From this we get automatically that 

(13.55) {\Eo{f){w)\' + E \E,{f){w) - E,.,{f){w)\'y^" > A 

i=i 

for some /. On the other hand, Ej{f) is constant on dyadic intervals of length 
2~\ and hence Ej{f){w) = Ej{f){y) for all j < I and for all y which lie in 
the dyadic subinterval of [0, 1) of length 2~' that contains w. For these y's, 
we obtain that 

(13.56) {\Eo{f){y)\' + E \E,{f){y) - E,_,{f){y)\f^" > A. 

i=i 

Therefore S{f){y) > A for all y in this same dyadic interval. Thus there is 
a dyadic interval which contains w and which lies in our set, and the lemma 
follows, since w was an arbitrary element of the set. 
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Let Qo denote the collection of maximal dyadic subintervals of [0, 1) which 
are contained in {x G [0,1) : S{f){x) > A}. We shall assume for the time 
being that {x G [0, 1) : S{f){x) > A} 7^ 0, so that ^0 7^ as well. As usual, 
we have that 

(13.57) U J={xG[0,l):^(/)(x)>A}, 

JeGo 

and that the intervals in Qo are pairwise disjoint. In particular, 

(13.58) E |J| = |{xG[0,l):5(/)(a;)>A}|. 
J&Go 

Let us also assume that {x G [0,1) : S{f){x) > A} is not equal to the 
whole unit interval [0, 1). Let Qi denote the set of dyadic subintervals L of 
[0, 1) such that there is a J G ^0 which satisfies 

(13.59) J C L and |J| = |L|/2. 

Thus ^1^0 since ^0 7^ and {x G [0, 1) : S(/)(x) > A} ^ [0, 1). Because 
the elements of Qo are maximal dyadic intervals contained in {x G [0, 1) : 
S{f){x) > A}, we obtain that each L G ^1 is not a subset of {x G [0, 1) : 
S{f){x) > A}, i.e., there is a point u & L such that 

(13.60) Sif)iu) < A. 
The definition (|13.1| ) of S{f) then implies that 

^^^^ X 1/2 

(13.61) {\Eo{f){u)\' + E \E,{f){u) - i?,-i(/)(n)p) < A, 

i=i 

where i{L) is chosen so that 2~^^^^ = 1-^1- (If ^(L) = 0, then the sum on the 
left side of the inequahty is interpreted as being 0.) We conclude that 

m , 

(13.62) {\Eo{f){w)\' + E \E,{f)H - E,-i{f){w)\') < A 

for all w E L, 

since Ej{f){w) is constant on L when j < i{L). 

The intervals in Qi need not be pairwise disjoint, and, as usual, we can 
pass to a subset Qio (of maximal elements of Qi, as in Lemma p.5| ) so that 

(13.63) U ^ = U ^ 
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and 

(13.64) LinL2=0 when Li,L2 G 7^ L2. 
The definition of Qi imphes that 

(13.65) U U 
Hence 

(13.66) U LD{xe[0,l):S{f){x)>X}, 
by ( |13.63|) and (|13.57|) . On the other hand, 



(13.67) ^ |L| < 5: 2 I J| = 2 |{x e [0, 1) : S{f){x) > X}\, 

because of the way that the intervals L E Ti were chosen and ( p,3.58| ). 
Define a function g\{x) on [0, 1) by 

(13.68) gx{x) = '^^^J^f(y)^y when X G L, L G ^10 

= fix) when X G [0, 1)\[ U l\ 

Lemma 13.69 If K is a dyadic subinterval of [0, 1), and if either K ^ L 
for some L G Gio, or x G K for some x G [0, 1)\ ( ULegio ^ ] ' ^hen 



:i3.70) f g,{u)du=-^ f f{u)du. 

\K \ Jk \K\ Jk 



This is not hard to check, by writing K as the disjoint union of the sets 
( UlgSio ) intervals in Qiq which are subsets of K. 



Corollary 13.71 If x e [0,l)\{ULegu> L) , then M{f){x) = M{g^){x 



This follows immediately from Lemma 13. 6£ 
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Corollary 13.72 If x G [OA)\(ULeg,oL), then S{f){x) = S{gx){x). If 
L E Qio, then 

m 



(13.73) S{g^){v) = {\E,{f){v)\' + - E,^,{f){v)\ 



for all V E L (where 2~^*^^^ = \L\, as before). 

This can be derived from Lemma |13.69| and the definition ( |13.68|) of gx. 



Corollary 13.74 S{gx)ix) < min(A, for all x E [0,1). 

That S{gx){x) < S{f){x) for all x E [0, 1) follows easily from Corollary 



13:721 If s G [0,1)\ ULeSio^ ' then S{gx)ix) = S{f){x) < X because of 



(p:66|) . If X G L for some L E C Q^, then S{gx){x) < A by (p73D and 



13:6^ ). 



Now let us apply gx to the estimation of (|13.52|) . Because of Corollary 
|13.71| , we have that 

(13.75) {x G [0, 1) : M(/)(x) > A} 

C ( U L)u{a:G[0,l):M((7,)(a;)>A}. 

Hence 

(13.76) KxG [0,1) :M(/)(x) > A}| 

< |i^| + |{xG[0,l):M(^,)(x)>A}|, 



and therefore 



(13.77) |{a;G[0,l):M(/)(a;)>A}| 

< 2 |{a; G [0, 1) : S{f){x) > A}| + |{a; G [0, 1) : M{gx){x) > A}|, 



by (^ 



Observe that 

(13.78) |{a;G[0,l):M((7A)(a;)>A}| < A'^ / M{gx){uf du 

"'[0,1) 



'[0,1) 

< C\-^\ \gx{y)fdy 

•^[0,1) 
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for some constant C > (independent of g and A). This uses Proposition 



12.32 with p = 2. On the other hand, 



;i3.79) 



[0,1) 



\9x{y)\'^dy 



[0,1) 



< 



[0,1) 



S{gx){wfdw 
min(A, S{f){w))'^ dw, 



by Lemma |13.11| and Corollary |13.74| . Hence 



(13.80) |{xG [0,1) : M{gx){x) > X}\ < C X'^ f mm{X, S{f){w) f dw. 

J[o,i) 

Putting this estimate into ( |13.77| ), we obtain that 

(13.81) |{a;G[0,l):M(/)(x)>A}| 

<2\{xe [0,1) : S(f)(x) > All + CA-2 / mm(X,S(f)(w)?dw. 

J[o,i) 

Near the beginning of this argument, we made two assumptions, which were 
that the set {x G [0, 1) : S{f){x) > A} is neither empty nor all of [0, 1). If 
{x G [0, 1) : S{f){x) > A} is all of [0, 1), then ( |13.81| ) holds automatically, 
without the second term on the right side, or the factor of 2 in the first term 
on the right side. If {x G [0, 1) : S{f){x) > A} = 0, then the first term on the 
right side of (|13.81|) is equal to 0. In this case we merely take g\ = f on all 
of [0, 1), and the same argument works as before (with some simplifications). 
In particular, we have that S{f) = S{gx) < A everywhere on [0, 1). 

Thus we get ( |13.81|) , and without these extra assumptions. At this stage. 



the argument can use the same kind of computations as in Section 13.2 , 
starting from ( p.3.42|) . That is, we apply Lemma |12.42| and integrate in A. 

As before, we also get a constant which blows up as p ^ 2, and this is 
silly, because Lemma p.3.11| indicates that p = 2 is fine. This can be fixed. 



13.4 Duality 

If /i; /2 are functions on [0, 1), and / is a nonnegative integer, then 
(13.82) / Ei{h)Ei{f2)dx = 

I (EoU\) E0U2) + E(^.(/i) - i^,-i(/i))(i^^,(/2) - E,.^U2))) dx. 

J [0,1) \ J 
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(As usual, the sum in j on the right side is interpreted as being when 
/ = 0.) This is a "bihnear" version of ( |13.12|) in Lemma 13.11 . The proof 
is essentially the same, i.e., with respect to the usual integral inner product, 
Eo{fi) is orthogonal to Ek{f2) — -E'fc-i(/2) for k > 1, -Eo(/2) is orthogonal to 
Ejifi) - ^j-i(/i) for j > 1, and Ej{fi) - ^j-i(/i) is orthogonal to Ek{f2) - 
Ek-i{f2) when j, k > 1, j k. Also, 



(13.83) 
(13.84) 



Eiih) = i?o(/i) + E(^.(/i)-^.-i(/i)) 

Ek{f2) = i?o(/2) + E(i?fc(/2)-i?fe-l(/2)). 

k=l 



Thus one can multiply Ei{fi) and Ei{f2), expand out the product using 
( |13.83|) and ( [L3.84| ), integrate, and cancel out the cross terms using the or- 
thogonality properties. 

Under suitable conditions, one has that 

(13.85) / fif2dx = 

J[o,i) 

f (e,Ui) + Y,{E,Ui) - E^_,Ui)mU2) - E,^,{f2))) dx. 

For instance, this is easy if at least one of /i and /2 is a dyadic step function. 
(In this case 



(13.86) 



[0,1) 



fi /2 dx 



[0,1) 



Ei{j\)Ei{f2)dx 



for sufficiently large I.) 

A consequence of ( |13.85| ) is that 



(13.87) 



[0,1) 



fi{x) f2{x) dx 



< 



[0,1) 



S{h){x)S{f2){x)dx. 



This follows by applying the Cauchy-Schwarz inequality to the integrand on 
the right side of ( |13.85| ). That is. 



(13.88) 



Eo{fi){x)Eo{f2Kx) + 

00 

Y.mh){x) - E,.,{h){x)){E,{f2){x) - E,.,{f2){x)) 
j=i 
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is less than or equal to the product of 

oo 

(13.89) {\Eo{h){x)\' + Y: \E,ih)ix) - 



and 



1/2 



1/2 



(13.90) (|i?o(/2)(x)r + E \m2)ix) - E,.,{h){x)\- 

for all X G [0, 1), and this product is the same as 

(13.91) SUi){x)SU2){x). 

Proposition 13.92 For each real number q > 2, there is a constant C^{q) > 
such that 
(13.93) 



\fix)\^dx<Csiq) / Sif)ixydx. 

[0,1) ^[0,1) 



To prove this, we shall use a duality argument. Let g > 2 be given, and 
let p, 1 < p < oo, denote the exponent dual to q, so that 1/p + 1/q = 1. 
Thus p < 2. 

It suffices to show that there is a constant C such that 



;i3.94) 



[0,1) 



/(x) fiix) dx 



1/9 



1/p 



<C( S{frdy)-[j \h\''dw] 
V[o,i) ^ V[o,i) ^ 

for all functions /2 on [0, 1) which are dyadic step functions. (The sufficiency 
of this comes from choosing /2 so that / ■ /2 is or approximates |/|'', and 
using the fact that p(g — l) = q and 1/p = 1 — 1/g. More precisely, given any 
positive integer /c, one can choose /2 to be Ek{f) ■ \Ek{f)\'^~'^ , where this is 
interpreted as being when Ek{f) is 0. Then f ■ f2dx = \Ek{f)\'^ dx 
and\M = \Ek{f)r\) 

Recall that Holder's inequality (with this choice of q, p) says that 



;i3.95) 



< g(x)h(x)dx<(l givYdyV^'d hiwfdw 

[0,1) ^"'[0,1) ^ ^"'[0,1) 



1/p 



for arbitrary nonnegative functions g, h on [0, 1). Applying this to the right 
side of (|13.87|) (with /i = /), we obtain 



^13.96) 



^ f{x)f2{x)dx <(( S{fYdyf'(( SihYdwf". 

[0,1) ^"'[0,1) ^ ^"'[0,1) ^ 



We can now get ( |13.94[ ) from this and Proposition |13.16| , since p < 2. (We 



are also using Proposition |12.3^ here.) 
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Remcirk 13.97 This general method works without the restriction to q > 2, 
i.e., inequahties hke 

(13.98) ( / S{f2rdwy^' <K(f \f2\'dwY^' 

(for some constant K and arbitrary functions /) lead to inequalities of the 
form 



(13.99) 




1/9 



for the conjugate exponent q, l/q + 1/p — 1 (as long as p, g > 1). 



13.5 Duality, continued 

Proposition 13.100 For each real number q > 2, there is a constant C^^q) > 
such that 

(13.101) / SU){xydx<C^{q)( \f{x)\''dx 

i[0,l) i[0,l) 

(for arbitrary f on [0, 1) ). 

Let q > 2he given. It is enough to show that there is a constant C4{q) > 
so that 

(13.102) / Si{f){xydx<C,{q) [ \f{x)\Ux 

for all integers I > and all /. 

Let p be the conjugate exponent to q, so that 1/p + 1/q — 1. It suffices 
to show that there is a constant C > so that 



(13.103) / (ao(:r)i5o(/)(a;) + E«i(^)(^.(/)(^)-^.-i(/)(^)))^^ 

"'[0)1) j=i 

<c{f (i:\a,i,.)ff d,.y" 

^■^[0,1) ^ ^-^[0,1) ^ ^ 

for arbitrary functions ao(a;), ai{x), . . . ai{x) on [0, 1). (For the sufficiency of 
this one can choose the ccj's to be given by 



;i3.104) ao = Eo{f)-Si{fy-' 



and 



;i3.105) a, = {E,{f) - E,.,{f)) ■ Si{fy-' 
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when j > 1. In this case 

(13.106) ao ■ Eoif) + jZar {E,{f) - Ej_,{f)) 

= Siiff ■ Siifr-' = Siifr 

and 

(13.107) |a,f )'^' = S,if) ■ SiUf-' = Siifr\ 

j=Q 

Since p is conjugate to q, p {q — 1) = q and 1/p = 1 — l/q.) 
It is easy to check that 



(13.108) / g{x)EAh){x)dx= EJg){x)h{x)dx 

J[o,i) J[o,i) 



for any functions g, h on [0, 1) and nonnegative integer j. Thus ( 13.103 ) is 
equivalent to 



(13.109) / (Eo{ao){x) + ^(E,(a,)(a;) - E,.,{aj){x))) f{x) dx 
Because of Holder's inequality, this inequality will follow if we can prove that 



dx 



(13.U0) (4j.,a.„)W.g,.,«,(.)-...(a,)W) 
for some constant C", all / > 0, and arbitrary functions a^i^x), q;i(x), . . . ai{x). 



From Proposition |13.50| we know that the left side of the above inequality 



is bounded by a constant times 

(13.111) (/ s(Eo{ao) + jz{EM^)-E,_,{a,))){xrdxfr 
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Thus we would like to show that this is bounded by a constant times 

(13.112) (/ 

"'[0.1) j=o 

On the other hand, one can check that 

(13.113) S{Eo{ao) + jZ{E,{a,) - E,_i(«,))) (x) 

i=i 

I 

= (|Eo(«o)(x)P + E \EAa,)ix) - E,^^{a,)ix)\' 
i=i 

Hence we would like to show that 

I 

, (\Eoiao)ix)\^ + ' 

'[0,1) 



1/2 



(i3.ii4,(/^^^_(,..MWP.g|.,„,(.)-.,.K)(.)ir..-" 



is bounded by a constant times (|13.112 ). This will be handled in the next 
section. 



13.6 Some inequalities 

Let / be a nonnegative integer, and let Po{x) , Pi{x) , . . . , Pi{x) be functions on 
[0, 1). Given p,r > 1, consider the problem of bounding 



(13.115) (/ (T.\Emi^Wf^ dx''^' 

-'[0,1) ^—a 

by a constant times 



j=0 



(13.116) (/ dxY^' 

J [0,1) 



j=0 

(where the constant does not depend on / or /5o(x), /?i(x), . . . ,Pi{x)). 

The situation at the end of the previous section corresponds to r = 2, 
1 < p < 2. At that stage, cancellation was not really at issue, and so we pass 
to the variant here. Furthermore, the (3j^s might as well be nonnegative for 
these types of inequalities. 
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If p happens to be equal to r, then the bound holds with constant equal 
to 1, and it reduces to the inequality 



(13.117) 



[0,1) 



\Ej{(3){x)\Pdx < 



(xWdx 



[0,1) 



for any function (3 on [0,1). This inequality follows from that of Jensen. 
More precisely, Jensen's inequality can be apphed to get 



(13.118) 



1 



< 



\J\ J J 



for any interval J. The previous inequality uses this one for the dyadic subin- 
tervals J of [0, 1) of length 2"-', summing over all these dyadic subintervals 
to get the integrals over [0, 1). 



If r = cxD, then we interpret 



(13.119) 

as being 
(13.120) 



(E 1/3.(^)1' 



3=0 



l/r 



sup \Pj{x)\, 

0<j<l 



Ei^.(/5.)(^)r 

j=0 



sup \Ej{(3j){x)\ 

0<j<l 



l/r 



(as usual). In this case we may as well assume that all of the jSj^s are 
the same, because we could replace them all with suPq<j<;; |/3j(x)| (which 
would not affect (|13.116|) , and would increase (|13.115|) or keep it the same). 
The bounds in question would then be essentially ones about the maximal 
function M(/), and in fact we have such bounds for 1 < p < oo, as in Lemma 



12.13 and Proposition 12.32 



Now suppose that 1 < r < p < oo. Let s G (1, oo) be conjugate to p/r, 
in the sense that 

(13.121) l/s + r/p=l. 

To show that ( |13.115D is bounded by a constant times ( |13.116D , it suffices to 
show that there is a constant C so that 



(13.122) 



[0,1) 



{'£\E^{P,){x)\^)h{x)dx 



j=0 



dw 



l/s 
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for all nonnegative functions h on [0, 1), and all / > and Po, Pi, ■ ■ ■ , Pi- 
Since r > 1, we have that 

(13.123) \EM){^)\' < Emm^) 

for all X G [0, 1). This follows from Jensen's inequality, as in ( |13.118| ) (with 
the p there replaced by r). Thus 



13.124j|'^^^(^jEj(/3j)(x)r) h{x) dx < J^^^^{j2Ej{\Pj\^){x)) h{x) dx. 



On the other hand, Sy^ ^^^ Ej{g){x) h{x) dx = J^Q i^g{x)Ej{h){x)dx for arbi- 
trary functions g and h on [0, 1), and so we get that 

i{h){x) dx. 



(13.125)/ C£\EM)i^)r)h{x)dx< V|/5,(x)ri?^ 

•'[Oil) j = Q •'[Ojl) j=Q 

The maximal function M{h) = snpj Ej{h) can be put into the right side, to 
get 



(13.126)^^ ^^(E \EM)i^)\'') h{x) dx < J^^ E l/?.(^)r' Mmx) dx. 



Since M{h) does not depend on j, it can be separated from the sum, and 
one can apply Holder's inequality with exponents p/r, s to obtain 



(13.127) [(j:\EM){x)\')h{x)dx 

^■J oA)^~rn ' ' ^-^ 0,1 ' 



This is exactly what we want, except that the last factor contains M{h) 
instead of h by itself. One can get rid of this, with an extra constant factor 
on the right side, using Proposition |12.32| . (Note that s > 1 when p < oo.) 

Thus ( |13.115|) is bounded by a constant times (|13.116|) when 1 < r < p < 
oo. If 1 < p < r < oo, then one can get a similar inequality by duality. In 
other words, if p', r' are the exponents conjugate to p, r, respectively, so that 

(13.128) l/p+l/p' = l, l/r + l/r' = l. 
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then one can get the same estimate for {p,r) as for {p',r'). 

To make this precise, let us note that (|13.115|) is bounded by a constant 
A times ( |13.116| ) (for all f3o{x) , Pi{x) , . . . , Pi{x)) if and only if 



(13.129) 1/ ($:E,(/3,)(a;)7,(x))rfx 
is bounded by A times 

(13.130) (/ (i:m^)\f^dxy^'{[ fEl7.(x)rT'^^'cix 

■^fO.l) j_n •^0,1) 



j=0 



j=0 



i/p' 



for all Po{x),Pi{x), . . . ,/5/(x) and 7o(x),7i(x), . . . ,7;(x). The "only if" part 
of statement can be verified using Holder's inequality twice. For the "if" 
part, one makes suitable choices of the 7/s in terms of the Eki^PkYs to get 
back to the original integrals and sums. 
As in ( 113.1081) , ( |13.129| ) is equal to 

r ' 

^13.131) 



/ {J2f3,{x)E,{^,){x))dx 

"'[O'l) j=0 



This permits one to derive the statement mentioned above, about having the 
same estimate for (p, r) as for {p',r'). The assumption that p < r exactly 
corresponds to r' < p', which brings us back to the previous case. 

In this kind of argument, one can allow for exponents which are infinite, 
with suitable adjustments to the formulae. In particular, suprema would be 
used when necessary. 



13.7 Another inequality for p = 1 

Proposition 13.132 (Weak-type estimate for S{f)) If X is a positive 
real number and f{x) is an arbitrary function on [0, 1), then 

(13.133) |{x G [0, 1) : 5(/) > A}| < ^ / \fix)\dx. 



To prove this we can use much the same argument as in Section |13.2 . 
More precisely, ( |13.39D and Lemma |13.40| in Section p.3.2| imply that 

(13.134) \{xe[0,l):S{f){x)>A}\ 

< 2 |{x G [0, 1) : Mif)ix) > A}\ + / |/a(x)| dx, 

J[0,1) 
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where f\{x) is as in ( |13.26D . Proposition p.2.14| in Section |12.2| can be applied 
to convert this to 

(13.135) |{xG[0,l):^(/)(x)>A}| 

<2\-' f \f{x)\dx + X-' f \fx{x)\dx. 

JlOA) J\0,1) 



On the other hand, 
(13.136) 



[0,1) 



\fx{x)\dx< / \f{x)\dx 



for all A > 0. This can be derived from the definition ( |13.26| ) of fx{x). More 
precisely, for any interval L in [0, 1) we have that 



(13.137) 



1 

ill Jl 



f{x) dx 



< 



L Jl 



\f{x)\dx, 



and this can be used in the decompositions involved in the definition of fx- 
Once one has (|13.136|), (|13.133|) follows easily from (|13.135|). 



13.8 Variants 

The techniques applied in this chapter have versions that are used in numer- 
ous settings in analysis. There are also some alternatives, in this and other 
situations. 

A basic point is that one frequently considers classes of linear operators 
(which are compatible with some underlying geometry or structure). One 
could employ the results of this chapter to the study of linear operators, but 
one can also approach linear operators directly, through similar arguments 
(i.e., with similar approximations and comparisons). For linear operators 
duality can be more straightforward, especially with operators or classes of 
operators which are self-dual. Let us note that sublinear operators such as 
those examined here can often be viewed in terms of linear operators which 
take ordinary scalar-valued functions to functions with values in a normed 
vector space. This perspective is sometimes helpful, and more generally one 
can look at operators which take vector-valued functions to vector-valued 
functions. 

A very famous linear operator is the Hilbert transform or conjugation op- 
erator, acting on functions on the unit circle. This operator can be described 
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in terms of complex analysis as the operator on boundary value functions 
which corresponds to harmonic conjugation of harmonic functions on the 
unit disk in the complex plane. This operator also has a simple description 
in terms of Fourier series, and from it one can obtain operators which trun- 
cate a given Fourier series (so that bounds for the Hilbert transform lead to 
bounds for the truncation operators). There is a version of this operator on 
the real line, with analogous features in terms of harmonic conjugation and 
Fourier transforms (and explicit links to the operator on the unit circle). 

Classically, connections with complex analysis were used as a very strong 
tool in the study of certain special operators like the Hilbert transform. The 
results of this could then be applied to Fourier series, for instance, where 
complex analysis did not seem to be involved, and to other operators which 
could be treated in terms of these building blocks. 

This is quite beautiful and remarkable, and has numerous interesting 
aspects. On the other hand, there was also the problem of having more 
direct real-variable methods, which should both be interesting in their own 
right and have further applications. 

The paper ||CalZ|] of Calderon and Zygmund was a fundamental step in 
this development. The arguments in this chapter have followed the same 
rough plan (for which there are versions in a number of settings). In general, 
there are mild additional terms and other adjustments involved, compared 
to the situation here. Depending on the circumstances, there can also be 
special structure which allows simplification in different ways. There are 
other circumstances in which analogous questions come up and the basic 
methods are not directly sufficient or applicable, and a variety of refinements 
and additional tools have emerged. 

13.9 Some remarks concerning p = 4 

Suppose that /(x) is a function on [0, 1), and consider the integral 





It turns out that this can be written as A + 2 B, where 




oo 



(13.139) A 
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and 



(13.140) B = [ \Eoif)ix)\'Eoi\f-Eoif)\')ix)dx 

J[0,1) 
„ oo 

+ / V|i5;,(/)(x)-E,_i(/)(x)rE,(|/-E,(/)p)(x)rfx. 



Let us first sketch the computations behind this, leaving the details as an 
exercise, and then discuss some of its consequences. 

One can start with the definition ( |13.1|) of S{f){x), which expresses 



S{f){xY as a sum. Thus S{f){x)^ can be given as a square of a sum, which 
can be expanded into a double sum of the products of the terms in the initial 
sum. If j and k are the indices in the double sum, then the double sum can 
be decomposed into three parts, corresponding to j = k, j < k, and j > k. 
The integral of the j = k part is given exactly by A above, by inspection. 
The j < k and j > k parts are equal to each other, by the symmetry of the 
sum. It remains to show that the integral over [0, 1) of the j < k part is 
equal to B above. This equality does not work directly at the level of the 
integrands, as for the j = k part, but uses the integration. 

For each j > 0, let Bj be the part of B that corresponds to j in the 
obvious manner. It is enough to show that Bj is equal to the integral over 
[0, 1) of the piece of the original double sum that comes from restricting 
ourselves to this particular j and summing over k > j. 

Fix j > 0, and let / be a dyadic subinterval of [0, 1) of length equal to 

. A key point is that 

/oo 
^ \E,{f){x)-Ek^,{f){x)\'dx 
- k=j+i 

\f{x)-E,{f){x)\'dx. 



Note that Ej{f){x) is in fact constant on J, since |/| = . This equality 
comes from orthogonality of the {Ek{f) — -E'fe_.i(/))'s, k > j, on /, as in the 
situation of Lemma p.3.11| . 

The formula for Bj that we want says that one integral over [0, 1) is the 
same as another integral over [0,1). To establish this it suffices to show 
that the corresponding integrals over dyadic intervals of length are the 
same. Fix such an interval I. Inside both integrals one has the expression 
|Eo(/)(a;)p when j = and \Ej{f){x) - Ej^i{f){x)\'^ when j > 1. These 
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expressions are constant on /, since |/| = 2~K Hence these expressions can 
be pulled out of the integrals whose equality is to be shown. The integrals 
that are left reduce to the two sides of ( |13.141D , and this yields the desired 
equality. 

This completes the sketch of the proof that (|13.138|) is equal to A + 
where A and B are as in ( |13.139| ) and ( |13.140| ). As a consequence of this, it 
is not hard to see that there is a constant C > so that 



13.142) / S{f){xfdx<C I S{f){xf M{\f\'){x) 

"'[0,1) "'[O,!) 



dx 



for arbitrary functions f{x) on [0, 1). (Note that M{f){xf < M{\f\^){x) for 
all x, by Jensen's inequality.) 

Using ( |13.142| ), one can obtain that 



^13.143) 



[0,1) 



S{f){xYdx<C' I \f{x)\ 



dx 



for some constant C" > and arbitrary /. This also relies on the Cauchy- 
Schwarz inequality, and on Proposition |12.32| in Section |12.4| with p = 2 and 
l/P instead of / to estimatejjojjAf (|/p)(x)^ rfx in terms of /pj^^ \f{x)\'^dx. 
In other words, we get (|13.101|) in Proposition |13.100| in Section |13.5| for 
4. Once one knows this, one can prove the analogue of Proposition |13.16 
in Section |13.2| for p < 4 instead of p < 2, through essentially the same 



method as before. 

Similarly, the interpolation results in Chapter [1^ (especially Section 14. 6|) 
permit one to derive inequalities for 2 < p < 4 from the cases of p = 2, 4 
(and this works in a very general manner). 

This type of approach works in a number of settings. That is, for spe- 
cial p's there are special arguments or computations, and then one can ex- 
tend from there using other methods. A famous instance of this occurs in 
M. Riesz's estimates | pi.ie| | for the Hilbert transform (mentioned in the previ- 
ous section). 



Chapter 14 

Interpolation of operators 



We follow here the method of |[Rie|| . (See |PerL| , pe| , pad| , |SteW2| , |Zygl|| for 
further information.) 



14.1 The basic result 

Let (aj^k) be an n X n matrix of complex numbers. We associate to this a 
bilinear form A{x, y) defined for x,y & C" by 

n n 

(14.1) A{x,y) = ^^yjaj^kXk. 

j=i k=i 

For 1 < p < oo define Mp as follows. If 1 < p < oo, then we set 

(14.2) Mp = 

snJ\Aix,y)\ : x,y E C^ Ixkl'f^' < 1, (E \y/f'' < l|. 

k=i j=i ^ 

Throughout this chapter p' denotes the conjugate exponent of p, so that 

1 1 

14.3 - + - = !• 

p p 

Thus 1 < p' < oo when 1 < p < oo. If p = 1, so that p' = oo, then we set 
(14.4) Ml = snp\\A{x,y)\ : x,y e C''\jZW\ < 1, max \y^\ < ij. 
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li p — oo, so that p' — 1, then we put 

(14.5) M^ = sup{|^(x,y)| :x,yeC" max \xk\ < hj^lVjl <A- 
In each case, Mp is a nonnegative real number. 

Theorem 14.6 The function log Mp is convex as a function of 1/p & [0, 1] . 
That is, ifl<p<q<oo and < t < 1, and if r satisfies 

/-, . 1 t 1-t 

(14.7) - = - + 



r p g ' 



then 

(14.8) Mr<MlMl~\ 

The analogous result holds (and by essentially the same proof) if the 
Oj^fc's are all real numbers, and we restrict our attention to x, y in R" in the 
definition of Mp. 

Note that if Mp = for some p, then ^4 = and Mp — for all p. One 
may as well assume that A ^ in the arguments that follow. 

We shall discuss the proof of this theorem in this and the next sections. 
We begin by observing that Mp can also be given as 

(14.9) Mp = sn^Uj2\jZaj,kXk'\)"'' : x e C", (f^ 1x^1^)'^" < l| 

j=l k=l k=l 

when 1 <p < oo, and, for p — oo, as 

(14.10) Moo — sup< max [S^ajkXk '■ x e C", max \xk\ < 1 

|^l<j<nl^7^ ' l<k<n ) 

Indeed, this definition of Mp is greater than or equal to the previous one 
because of Holder's inequality. Conversely, for each x G C" one can choose 
y e C" so that the inequality in Holder's inequality becomes an equality, and 
so that 

(14.11) (Ei%rT^'' = i 

when 1 < p' < oo, and 

(14.12) max \y,\ = 1 
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when p' = oo. This is not hard to verify, and it imphes that the original 
definition of Mp is greater than or equal to the version above. Therefore the 
two are equal. 
Similarly, 



(14.13) M, = sup (^|5:%a,,,f) ■.yeC\{Y.\y^') <A 

^ k=i j=i j=i ^ 

when 1 < p < oo (so that p' < oo), and 

(14.14) Ml = sup j max |^ yj a^- ^ : y G C", max^ bjl < l|. 

The equivalence of these expressions for Mp is a special case of duality. 

Remark 14.15 For p = 1 and q = oo there is an earlier and simpler ap- 
proach of Schur. A key point is that 

n 

(14.16) Ml = max V |a, fc| 

l<k<n ^ 

and 

n 

(14.17) Moo = max \ajk\- 

i<i<«^i ' 

(Exercise.) Once one has these formulae, one can estimate Mr for 1 < r < oo 
using Jensen's inequality or Holder's inequality. In general, Mp does not 
admit such a nice expression, and Theorem |14.6| works around this problem. 



Remark 14.18 The quantity Mp is the same as the operator norm of the 
linear transformation associated to the matrix aj^k using the p-norm || ■ ||p 
as in Section |4.1| on both the domain and the range. Instead of using the 
same p-norm on the domain and the range, suppose that one considers the 
operator norm defined using the pi-norm on the domain and the j92-norm 
on the range. In the context of Theorem |14.6| , one would also consider the 
operator norm using a gi-norm in the domain and a g2-iiorm in the range, 
and one would seek an interpolation inequality for the operator norm using 
an ri-norm on the domain and an r2-norm in the range, where ri and r2 are 
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related to pi, qi and p2, q2 in the same manner as above (using the same t 
for ri, pi, qi and for r2, P2, q2)- 



Riesz's proof in ||Rie|| allows for this more general situation under the 
assumption that pi < p2 and qi < q2- There are other methods which permit 
one to deal with arbitrary pi, p2, qi, q2- Note, however, that the condition 
Pi < P25 91 < 12 holds in many natural situations. 



14.2 A digression about convex functions 

Let a, h be real numbers with a < b, and let /(x) be a continuous real- valued 
function on [a,b]. Suppose that 

x + y\ ^ f{x) + f{y) 



;i4.i9) /(^) 



2 / - 2 

for all x,y E [a,b]. Then / satisfies the "complete" convexity property 

(14.20) /(A X + (1 - A) y) < A /(x) + (1 - A) /(y) 

for all A G (0, 1) and x,y E [a, b]. This is the same as Exercise 24 on plOl of 



Rudlf (except for a minor change), and it is not hard to prove. For instance. 



one can repeat (|14.19| ) to obtain ( |14.20[ ) first with A equal to 1/4 or 3/4, in 



addition to 1/2, then with A of the form j/8, and so on. Thus (|14.2CI|) in fact 
holds when A is of the form /c/2', and this permits one to derive ( |14.20D for 
all A G (0, 1), by continuity. 

Here is a more general version of this observation. 

Lemma 14.21 Suppose that a, b are real numbers with a < b, and that f{x) 
is a real-valued function on [a, b] which is continuous. Assume that for each 
x,y E [a,b] there is a X^^y G (0, 1) such that 

(14.22) f{X.,yX + (1 - A,,,) y) < A,,, f{x) + (1 - A,,,) f{y). 

Then l \14.2(] ) holds for all A G (0, 1) and x,y E [a, b]. 



To prove this, define L^^^y for all x,y E [a, b] by 
:i4.23) L,,y = {A G [0, 1] : f{Xx + {l-X)y)<X f{x) + (1 - A) f{y)}. 
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Thus 0, 1 G Lx^y automatically, and \x,y G L^^y fl (0, 1) by hypothesis. The 
continuity of / implies that L^^y is always a closed subset of [0, 1]. We want 
to show that L^^y = [0, 1] for all x,y E [a, b]. 

Fix x,y E [a,b], and suppose to the contrary that L^^y ^ [0, 1]. Because 
(0, l)\Lx^y is an open set, it follows that there are Ai, A2 G L^^y such that 
Ai < A2 and 

(14.24) (Ai,A2) C (0,1)\L,,,. 
Define u,v E [a, b] by 

(14.25) u = Xix + {1- Xi)y, i; = A2 x + (1 - A2) y. 
By hypothesis, there is a Xu^v ^ (0, 1) which lies in L^^,, so that 

(14.26) /(A,,, u+il- A„,„) v) < Xu,. fiu) + (1 - A„,„) f{v). 
We also know that Ai, A2 lie in L^^y, and hence 

(14.27) f{u) < Ai fix) + (1 - Ai) f{y), f{v) < A2 f{x) + (1 - A2) f{y). 
Therefore, 

(14.28) f{Xu,.u + il-Xu,v)v)< 

(a,,, Ai + (1 - A„,,) A2) fix) + (A„,, (1 - Ai) + (1 - A„,,) (1 - A2)) fiy). 

Set a = Xu^v Ai + (1 — A„ t,)A2. Then a G (Ai, A2), and the preceding inequality 
can be rewritten as 

(14.29) /(A„,, u + il- A„,,) v)<a fix) + (1 - «) fiy). 
Similarly, 

(14.30) Xu,v M + (1 - A„,^) V = ax + il - a)y, 
and hence 

(14.31) fia x + il-a)y)<a fix) + (1 - a) fiy). 

In other words, a G Lx^y. This contradicts ( |14.24|) , and the lemma follows. 

As a consequence of the lemma, in order to establish Theorem |14.6| , it 
suffices to show that for each p, g G [1, 00] there is a t G (0, 1) so that ( |14.8|) 
holds. This uses the observation that Mg is a continuous function of 1/s, 
1/s G [0, 1], which is not difficult to show. 
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14.3 A place where the maximum is attained 



We continue with the same notation and assumptions from Section |14.1 



Fix a real number r, 1 < r < oo, and let r' be its conjugate exponent. 
There exist x^,y^ G C" such that 



(14.32) 
and 

(14.33) 
(14.34) 



\Aix',y')\=Mr 



(E 141 



fc=i 

n 



l/r 



1/r' 



1. 



In other words, the supremum in the definition ( |14.2D of M^. is attained at 
a;°, y^. We are lead to the conditions ( |14.33|) , ( |14.34| ) because otherwise we 
could multiply or y^ by a real number greater than 1, and A{x^, y^) would 
be similarly multiplied. 

For this choice of x^ and y^ we have that 



;i4.35) 



\A{x',y')\ 



\Y.J2yjHk4 

j=i k=i 

I n n 



i=l 



E|E%fc 4 

j=l k=l 



r\ l/r 



More precisely, if the second equality is replaced by <, then it always holds, 
by Holder's inequality. We have equality in this case because otherwise we 
could replace with an element of C" for which equality does occur (and 
with ( [14.34] )), and the value of |A(a;°,?/°)| would increase, i.e., it would not 
be the maximum. 

For the same reason. 



(14.36) 



\A{x^y')\ 



j=i k=i 

n n 



l/r 



k=l j=l 
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14.4 The rest of the argument 

Because of the second equality in ( p.4.35| ) , we have that 
(14.37) 



I X! 

k=l 



for j = 1,2, ... ,n, where /i is a constant. This can be obtained from the 
proof of Holder's inequality (and the conditions for equality in the arithmetic- 
geometric mean inequalities). Similarly, 



;i4.38) 



u \x 



Olr-l 



r'\ 1/r' 



for = 1, 2, . . . , n and some constant u. 
From ( |14.35| ) and (|14.36|) we have that 

n 71 , / n n 

(14.39) = (E|E«.>^°r) = (E|E2/>.> 

j=l k=l k=l j=l 

This uses the fact that = \A{x°,y^)\, by definitions, and ([14:331) , ( |T434D . 
Combining the first equality with ( |14.37| ) we obtain 



(14.40) 



= /i \y 



0|r(r'-l)y/'' 



On the other hand, r(r' — 1) = r', since 1/r + 1/r' = 1. Thus we may apply 
(imp to get 

(14.41) Mr = Ai. 

For the same reasons, 

(14.42) Mr = u. 

Now suppose that p, q, and t are real numbers such that I < p < q < oo, 
< t < 1, and 1/r = t/p + (1 - t)/q. Then 



(14.43) 

and 
(14.44) 



j=i k=i 



k=l 



(E|E^>. 

k=i j=i 



i,k 



q\ 1/9' 



i/p 



l/q' 
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where q' is the conjugate exponent of q. (Note that p,q' < oo under the 
present conditions.) These inequahties follow from ( 14.9|) and ( |14.13|) . 
Applying our earlier computations we get that 



(14.45) M.(5:iy°r-^))^^^<M,(5: 

j=l k=l 
and 

n 'ill ^ 1// 

(14.46) M^{XK\'^' <M,{Y.W) ■ 

k=i j=i 

We can take the tth and (1 — t)th powers of these inequalities and then 
multiply to get 

Pir'-1)Y/Pf^ I n|g'(r-l)\(l-*)/'?' 



(14.47) M.(Eb?r'-^0 

j=l k=l 



<m;m-* ^^(Eb? 

A;=l j=l 

We are going to need some identities with indices. Let us first check that 

Because 1/r = t/p+ (1 —t)/q, we have that 

, , , /IK d-l l-t 
(14.49) t( )=t 



r V p q 

Similarly, 1/r' = t/p' + (1 — t)/q' (simply by using the formula for 1/r in 
l/r'=l — 1/r), and this leads to 

(14.50) (1 - t)(l - ^) = (1 - - ^) = (1 - ^)(- + -)• 

The last step uses l/p' = 1 — l/p, l/q' = 1 — 1/q. The identity ( p.4.48| ) follows 
directly from these computations. 
Suppose that 

t 1-t 
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Then ( pTlHl) implies that t/p 
(14.52) p{r' - 1) 



q' 



t)/q'. This imphes in turn that 
q'{r — 1) = p. 



Indeed, r' — 1 = r'/r and r — 1 = r/r', since 1/r + 1/r' = 1. Thus r' — 1 
(1 - t)/t and r - 1 = t/(l - t), b y ([1431 . This leads to (|45|). 
Thus, under the assumption (|14.51|) , we obtain 



;i4.53) 



Mr < Ml M]-^ 



from (|14.47|) . That is, the sums involving x° and on the left and right sides 
of ( |14.47|) exactly match up under these conditions, by the computations in 
the previous paragraph. 

Note that for each p, q such that 1 < p < q < oo, there always is a 
t e (0,1) so that if r is given by 1/r = t/p + (1 — t)/q, then ( |14.51|) is 



satisfied. Theorem |14.6| follows, as indicated at the end of Section |14.2 



14.5 A reformulation 

Let T be a linear mapping from the vector space of dyadic step functions on 
[0, 1) into itself. Suppose that p and q are real numbers with 1 < p < q < oo, 
and that there are nonnegative real numbers Np, Nq such that 

(14.54) (^^^^ \TU){x)\^ dxf'" < N,{J^^^^ |/(x)rdx)'^' 
and 

(14.55) (/ |T(/)(a;)rdx)'^' < ( / l/(a:)Nx)'^' 

^J[0,1) ^ ^"'[0,1) ^ 

for all /. If g = oo, then the latter should be replaced with 

(14.56) sup |T(/)(x)|<iV^sup|/(x)|, 

x6[0,l) [0,1) 

as usual. 

Let t G (0, 1) be given, and define r by 1/r = t/p + {1 — t)/q. Then 

(14.57) (^^^^ \T{f){x)\^ dxf^' < NlNl' (^^^^ \f{x)\-dxf\ 
for all dyadic step functions /. 
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This assertion can be derived from Theorem |14.6| , as follows. Let m be a 
positive integer. We have that 



:i4.58) (/ \Eunmx)\p dxV^' < nJ f \f{x)\^dx 



f<N,n — ^^/^ 

'[0,1) ' ^"'[0,1) 

for all /, and analogously for q instead of p. Indeed, 

(14.59) / \E^{g){x)\'' dx < f \g{x)\P dx 

J[o,i) J[o,i) 

for all g, as can be derived from Jensen's inequality. We also have that 
^^P[o,i) \Em{g)\ < supjo^i) \g\- This implies that our hypotheses for T hold as 
well for Em o T. 



Theorem |14.6| can be applied to obtain that 
(14.60) (^^^^ lEUnmx)^' dxf' < NlNl' (^^^^ \f{x)Ydxf' 

for all step functions / on [0, 1) which are constant on dyadic intervals of 
length 2""*. In other words, we think of Em°T as a linear transformation from 
the space of step functions on [0, 1) which are constant on dyadic intervals 
of length 2""^ to itself. This space can be identified with C", n = 2"^ in a 
simple way, with the components of vectors corresponding to the values of 
functions on the dyadic subintervals of [0, 1) of length 2~™. (One can work 
with real-valued functions instead of complex-valued functions, as long as T 
maps real functions to real functions.) In this identification, the integrals 
of powers of absolute values of functions that we have here correspond to 
the sums that we considered before, except for constant factors, which can 
easily be handled. This permits us to derive (|14.60|) from the earlier result 



for linear mappings on C". (One might note that this process is reversible, 
i.e., one could go from vectors in C" to functions.) 

Once one has ( |14.60D for step functions which are constant on dyadic 



intervals of length 2 where m is arbitrary, it is easy to derive ( |14.57|) for 



dyadic step functions in general. (Of course one could extend this to other 
kinds of functions too.) 



14.6 A generalization 

The maximal and square function operators M, M;, Si are not linear, but 
the previous results also work for them. The reason for this is that although 
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these operators are not linear, they are given by suprema of absolute values 
of other linear mappings. 

More precisely, suppose that we have a linear operator T which is not 
linear, but which can be described in terms of a family {Ta}a of linear op- 
erators in the following manner: |Tq(/)(x)| < T{f){x) for all a, f, and x; 
and for any given function / there is an a such that T(/) = |Tq(/)| (i.e., 
T{f){x) = \Ta{f){x)\ for all x). (One could adjust this to allow for suitable 
approximations.) Then there are interpolation results for T which are just 
like the ones for linear operators. That is, one would assume conditions like 
( |14.54| ), ( p.4.55| ), ( |14.56| ) for T, and these imply analogous conditions for all of 



the linear operators T^, uniformly in a. One could then apply the interpola- 
tion result to get ( |14.57|) for Tq,, uniformly in a, and the analogous assertion 
for T would be a consequence because of the way that T(/) can be given in 
terms of T^ for some a, depending on /. 

This is the situation for the maximal and square function operators men- 
tioned above. For maximal functions, one lets a run through non-negative 
integer- valued functions on [0, 1), and for T^ one puts 

(14.61) T,(/)(x) = E„(,)(/)(x). 

Thus Ta{f) is linear in /, T„ can be bounded in terms of maximal functions, 
and one can have inequalities in the other direction by choosing a properly 
for a given /. In the context of square functions, one lets a{x) be a sequence- 
valued function such that 



;i4.62) {y: 



1/2 



for all X. For the counterpart of T^, one considers linear operators of the 
form 

oo 

(14.63) ao{x) Eoif)ix) + Y.a.,{x){E,if)ix) - E,,,{f){x)). 

i=l 

These expressions are linear in / and bounded by square functions, because 
of the Cauchy-Schwarz inequality. One can go the other way by choosing a 
properly for a given /. 

In this manner, one can get the same kind of interpolation estimates for 
M, Ml, S, Si as for linear operators. 



Chapter 15 

Quasisymmetric mappings 



15.1 Basic notions 

Let {M,d{x,y)) and {N,p{u,v)) be metric spaces. Thus M is a nonempty 
set and d{x, y) is a nonnegative real-valued function on M x M such that 
d{x, y) = exactly when x = y, d{x, y) = d{y, x) for all x,y & M, and 

(15.1) d{x, z) < d{x,y) + d{y, z) for all x, 2; e M 

(the triangle inequality), and similarly for {N, p{u,v)). 

Suppose that ri{t) is a nonnegative real- valued function on [0, 00). Let us 
say that 77 is admissible if 77(0) — and 

(15.2) lim77(0) = 0. 

A mapping f : M ^ N is said to be quasisymmetric if / is not constant 
(unless M contains only one element) and if there is an admissible function 
rj : [0, 00) — > [0, 00) such that 

(15.3) p{f{y)J{x))<7i{t)p{f{z),f{x)) 

whenever x, y, and z are elements of M and i is a positive real number such 

that 

(15.4) d{y,x) <td{z,x). 

Roughly speaking, this condition asks that relative distances be approx- 
imately preserved by /. For instance, if y is much closer to x than 2; is to x. 
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then ( p,5.4| ) holds with t small, and ( p,5.3D implies that f{y) is relatively close 
to f{x), compared to the distance between f{z) and /(x). Similarly, if y is 
not too close to x compared to the distance from z to x, but still y is not too 
far from x compared to the distance from z to x, then one can have ( |15.4|) 
with a t which is not too small and not too large, and (|15.3|) implies that 
f{y) is not too far from /(x) compared to the distance from f{z) to /(x). 

We shall describe some examples in the next two sections, but first let us 
mention a few simple facts pertaining to the definition. 

A quasisymmetric mapping f : M ^ N is one-to-one. Indeed, if x and 
z are two distinct points in M such that f{x) = f{z), then one can use the 
condition above to obtain that /(x) = f{y) for all y G M. Constant mappings 
are excluded from the definition of quasisymmetric mappings, except in the 
case where M has only one element, and we conclude that / is one-to-one. 

If rj : [0, oo) [0, oo) is admissible, then the function fj : [0, oo) —>■ [0, oo) 
given by 

(15.5) J]{t) = mi{r]{s) : s > t} 
satisfies 

(15.6) r]{t) < r]{t) for all t G [0, oo) 

and is admissible in its own right. It is easy to see that fj is monotone 
increasing, and is hence a kind of regularization of t]. On the other hand, if 
/ is any quasisymmetric mapping relative to the function r], then / is also 
quasisymmetric relative to rj. This follows easily from the definition. 

On the other hand, if / is a quasisymmetric mapping relative to a function 
rj, then / is also quasisymmetric relative to any function 9 : [0, oo) [0, oo) 
such that 

(15.7) T]{t) < 9{t) for all t G [0, oo). 

In particular, one can replace ri{t) with 9{t) = ri{t) + et for any fixed e > 
0. If ?7(t) is already monotone increasing on [0, oo), which can always be 
arranged as in the previous paragraph, then this choice of 9{t) will be strictly 
increasing. It will also tend to oo as t — > oo, so that 9 maps [0, oo) onto [0, oo). 

Now assume that f : M ^ N is quasisymmetric relative to a function 
r] and that f{M) = N. In this case the inverse mapping : N ^ M is 
defined, since / is one-to-one, and let us show that it is quasisymmetric. By 
the remarks in the previous paragraph, we may assume that r) : [0, oo) —>■ 
[0, oo) is invertible. We would like to show that is quasisymmetric relative 
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to a : [0, oo) [0, oo) defined by 

(15.8) a{t) = ^-i^^^-i^ when t > 0, a(0) = 0. 

This is equivalent to the assertion that 

(15.9) d{y,x) <a{t)d{z,x) 

whenever x, y, and z are elements of M and t is a positive real number such 
that 

(15.10) p{f{y)J{x))<tp{f{z)J{x)). 
It is convenient to rephrase this as the statement that 

(15.11) d{y,x) <td{z,x) 

whenever x, y, and z are elements of M and t is a positive real number such 
that 

(15.12) pifiy), fix)) < a~\t) p{f{z), fix)). 

We may as well assume that y x, since otherwise ( 15.11| ) is trivial. If 
(|15.11| ) does not hold, then 

(15.13) d{y,x) > td{z,x), 
or 

(15.14) d{z, x) < d{y, x), 
and hence 

(15.15) d{z,x) < sd{y,x) 
for some s < t^^. It follows that 

(15.16) p(/(^), f{x)) < vis) Pifiy), fix)), 

using the original quasisymmetry condition for / with the roles of y and z 
reversed, and with t replaced with s. This implies in turn that 

(15.17) pifiz), fix)) < vit-') Pifiy), fix)), 
or 

(15.18) Pifiz), fix)) < Pifiy), fix)). 

This is the same as saying that ( |15.12| ) does not hold, which is what we want. 

The composition of two quasisymmetric mappings is quasisymmetric. For 
the associated function r) for the composition, one can use the composition 
of the functions r] associated to the two mappings being composed. 
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15.2 Examples 

For any metric space {M,d{x,y)), the identity mapping on M is quasisym- 
metric, with respect to the function ri{t) = t. More generally, let (M, d{x, y)) 
and {N, p{u,v)) be metric spaces, and suppose that / : M — is an isom- 
etry, so that 

(15.19) p(/(x), f{y)) = d{x, y) for all x, y G M. 

Then / is quasisymmetric with ri{t) = t. 

A mapping f : M ^ N is said to be bilipschitz with constant C > if 

(15.20) d{x, y) < p(/(x), f{y)) < C d{x, y) 

for all x,y E M. If / is bilipschitz with constant C, then / is quasisymmetric 
with r]{t) = C^t. Note that / is an isometry exactly when it is bilipschitz 
with constant 1. 

Let us focus for the moment on the case where M = N = R" for some 
positive integer n, with the usual metric \x — y\, where \x\ denotes the Eu- 
clidean norm of x. Suppose that / : R" R" is an isometry. We can write 
/ as T{x) + Xo, where xq is a fixed element of R" and T : R" R" is an 
isometry which fixes the origin, by taking Xo = /(O). A well-known result 
(which is not too hard to prove) states that T is then linear. Thus T is an 



orthogonal transformation, as discussed near the end of Section pA . 

A mapping / : R" —>■ R" is called a similarity if it can be written as 
f{x) = aT{x) + Xo, where a is a positive real number, T is an orthogo- 
nal transformation on R"-, and xq is a fixed element of R". It is easy to 
see that a similarity is quasisymmetric on R", with ri{t) = t. A similarity 
f{x) = aT{x) + Xo is also bilipschitz with constant C = max(a,a~^). In 
particular, the bilipschitz constant depends on the scale factor a, while the 
quasisymmetry function rj does not. 

Conversely, if / : R" — > R" is quasisymmetric with rj{t) = t, then / is a 
similarity. Here is an outline of the argument. First, if / has this property, 
then does too, as in the previous section. Hence 

(15.21) \f{y)-f{x)\=t\fiz)-f{x)\ 

when x,y,z E R" and t > satisfy 



(15.22) 
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Using this, one can verify that there is an a > such that 

(15.23) \f{u)-f{v)\ = a\u-v\ 

for all M, u e R". In other words, a~^f{x) is an isometry on R", and hence 
/ is a similarity. 

Now suppose that 6 is a positive real number, and consider the mapping 
/ on R" defined by 

(15.24) /(x) = 

At a; = we interpret this as meaning that /(O) = (which would really 
only be in question for 6 < 1, and even then this is the natural choice in 
terms of continuity). We leave it as an exercise to show that these mappings 
are quasisymmetric. 

As an extension of this example, suppose that for each real number s one 
chooses an orthogonal transformation Rg on R", in such a way that 

(15.25) \\Rs - RtW < C \s - t\ 

for some constant C and all s,t G R. Here || • || denotes the operator norm 
for linear transformations on R" (and any other norm would work practi- 
cally as well, affecting only the constant C and not this Lipschitz condition 
otherwise). Then the mapping on R" given by 

(15.26) X ^ \x\^-^ R\og\x\{x) when Xy^O, 0^0 

is also quasisymmetric, for all 6 > 0. 

This mapping is actually the composition of (|15.24| ) and the mapping 
defined by 

(15.27) X Riog\x\{x) when X 7^ 0, t-^ 0. 

One can show that ( |15.27 ) is bilipschitz on R", under the assumption ( 15.25 ). 
As a result, the quasisymmetry of (|15.26|) follows from that of (|15.24|). 



15.3 Cantor sets 

Let K denote the usual Cantor "middle-thirds" set contained in [0, 1], as in 
IIRudlj . Thus i^' is a closed set obtained by removing the (open) middle-third 
(1/3,2/3) from [0,1], then removing the middle-thirds of the two intervals 
that result from that, then removing the middle-thirds of the four intervals 
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coming from the previous step, and so on. One can write K as O'jLoEj, 
where Ej is the union of 2^ disjoint closed intervals of length 3"-' produced 
after j steps of the construction, with Eq = [0, 1] and Ej^i C Ej for all j. 

If r is a real number such that < r < 1, we can define an analogous 
set K(r) C [0, 1] using r instead of 1/3. In the first step one removes (1/2 — 
r/2, 1/2 + r/2) from [0, 1] to get two closed intervals of length (1 — r)/2, at 
the ends of [0, 1]. After j steps, one gets a set Ej{r) which is the union of 
2^ disjoint closed intervals of length ((1 — r)/2)\ where -E'o(r) = [0, 1] and 
Ej+i{r) C Ej{r) for all j. As before, we can define K{r) to be (^'fLoEj{r■), 
and K{l/2i) is equal to the set K from the previous paragraph. 

Fix r e (0, 1). There is a natural one-to-one mapping hr from K onto 
Kr that one can define. The basic idea is that for each nonnegativc integer 
j there are 2^ intervals in Ej and in Ej{r), and for each positive integer m 
such that m < 2^ we want hr to map the intersection of K with the mth 
interval of length 3"-^ in Ej to the intersection of K{r) with the mth interval 
of length ((1 — r)/2y in Ej{r). When we refer to the mth interval in Ej or 
Ej{r), we go from left to right. It is easy to see that these requirements for 
different j's are compatible with each other. 

Note that and 1 lie in both K and K{r), and hr is defined so that 
hr{0) — and hr{l) — 1. Similarly, for j > 1 the endpoints of the intervals 
in Ej all lie in K (because they remain in Ei for all i > j), and the endpoints 
of the intervals in Ej{r) lie in K{r) for the same reason. The endpoints of the 
intervals in Ej are sent to the corresponding endpoints of the corresponding 
intervals in Ej{r) by hr- These assignments for different j's are compatible 
with each other because of the way that the various intervals fit together. 

Notice also that hr is not a linear function (of the form cx + d) on the 
intersection of K with any open interval that contains an element of K. 

One can view K and K,. as metric spaces, using the ordinary Euclidean 
metric d{x, y) on each. It is not too difficult to show that hr ■ K ^ Kr is a. 
quasisymmetric mapping. 

Here is a variant of this which is somewhat more complicated. Define a 
set J C [0, 1] by removing two-fifths at each step, in separate pieces, rather 
than one-third in one piece. Specifically, we start with [0, 1], as before, and 
in the first step we remove the two open intervals (1/5,2/5) and (3/5,4/5). 
This leaves the three closed intervals [0, 1/5], [2/5, 3/5], and [4/5, 1]. In each 
of these one removes two "fifths" again, to get a total of 9 intervals of length 
5^^. In general, after / steps, there are 3' closed intervals of length 5^'. If 
Fi denotes the union of these 3' intervals, then C Fi for all / and J is 
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defined to be 0^=0 ^z- 

To define a mapping from K to J, it is convenient to give a different 
description of K. Let us combine tlie first step in tlie construction of K witli 
fialf of tfie second step, as follows. In the new first step, we start with the 
interval [0,1], and remove two subintervals (1/3,2/3) and (7/9,8/9) to get 
the three closed intervals [0, 1/3], [2/3, 7/9], and [8/9, 1]. Now the intervals 
no longer have the same length, but we shall not be too concerned about 
that. 

One can repeat the process to each of these three intervals, obtaining a 
total of 9 closed subintervals of various lengths. In general, after I steps, one 
gets 3' closed intervals of various lengths. Let us write Ei for the set which 
is the union of these 3' intervals. 

Each of these / intervals in Ei also occurs in the -E/s. More precisely, 
each of these subintervals occurs in an Ej with I < j < 21. It is not hard to 
check that 

(15.28) E21 CEiQEi 
for all / > 0. As a result, 

00 

(15.29) K=f]Ei. 

1=0 

This description of K can be used to define a one-to-one mapping from 
K onto J. As before, this mapping takes the intersection of K with the nth 
interval in Ei to the intersection of J with the nth interval in F^, 1 < n < 3', 
I > 0. It also sends to 0, 1 to 1, and, in general, the endpoints of the 
intervals in Ei to the corresponding endpoints of the corresponding intervals 
in Fi. 

It is not too difficult to show that this mapping from K onto J is also 
quasisymmetric, using the Euclidean metric on both K and J. 



15.4 Bounds in terms of Ct^ 

Suppose that / : R" R" is a quasisymmetric mapping (with the standard 
Euclidean metric is used on R"). Then there are positive real numbers C, 
oi < 1, and 02 > 1 such that / is quasisymmetric with respect to 

(15.30) 77(t) = Ct"i forO<t<l, r]{t) = C t"^ for t > 1. 



148 



CHAPTER 15. QUASISYMMETRIC MAPPINGS 



Here is the basic idea. Because / is quasisymmetric, there are positive 
real numbers ti < 1 and L such that 

(15.31) \f{y) - f{x)\ < \ \f{z) - f{x)\ for all 
x,y,z E R" such that \y — x\ < ti \z — x\ 

and 

(15.32) \f{y) - f{x)\ < L \f{z) - f{x)\ for all 
x,y,z & IV such that \y — x\ < 2\z — x\. 

One can "iterate" these statements to obtain that 

(15.33) \f{y) - f{x)\ < ^ \f{z) - f{x)\ for all 
x,y, z & R" such that \y — x\ < \z — x\ 

and 

(15.34) \fiy) - fix)\ < \f{z) - f{x)\ for all 
x,y, z G R" such that |?/ — a;| < 2^ \z — x\ 

for all positive integers k. 

To be explicit, let us consider the case where k = 2. Suppose that x,y,z & 
R" satisfy |?/ — a;| < tf \z — x\. Let z be an element of R" such that 

(15.35) \z — x\ = ti\z — x\. 
Then 

(15.36) \y — x\ < ti\z — x\, 
and hence 

(15.37) \m-fix)\<l\f{z)-f{x)l 
by ( |15.31| ) with z replaced by z. Similarly, 

(15.38) |/(^)_/(a;)|<l|/(^)-/(a;)|, 

by (|15.31| ) with y replaced by z (and z kept as it is this time). Combining 
(|T5^) and ( 115^81) we obtain 

(15.39) |/(^)_/(^)|<^|/(^)_/(a;)|, 
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as desired. One can treat the case where |?/ — a;| < 2'^ \z — x\ similarly, using 
an element z of R" such that 

(15.40) \z-x\=2\z-x\. 

This argument works for quasisymmetric mappings between metric spaces 
in general under modest assumptions on the domain. To be more precise, 
suppose that we have a quasisymmetric mapping whose domain is a metric 
space (M, d{x, y)). If (M, y)) has the property that for each x G M and 
each positive real number r there is a point w G M such that w) = r, 
then exactly the same argument as above goes through. One can also work 
with bounded metric spaces, by restricting the r's to suitable ranges. In 
particular, these conditions hold when M is connected. 

In the case of the kind of Cantor sets considered in the previous section, 
some adjustments are needed to take care of the "gaps" , and this is not too 
hard to do. Basically, one should be a bit more careful with the points z, z 
as above. In this regard, it be preferable to use another number instead of 2 
in (|15.32|) , and to modify the choice of ti. 

Although it is convenient to think in terms of the domain here, there 
is a natural symmetry between the domain and the range, reflected in the 
fact that a surjective mapping is quasisymmetric if and only if its inverse is 
quasisymmetric. 
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